I have two virtual servers for hosting my web app. They are identical, running Debian 6 with 1.5GB of RAM. I configure the OS and Tomcat using a script from a fresh install, so I know they are identical.
My webapp runs in Tomcat and I set 850M heap and 100M perm size. My app regularly dies on one of the servers. My first instinct was to check for the OOM killer, but there is no evidence of this in the logs.
Questions:
- Can the OOM killer kill apps without leaving an appropriate log message?
- [Edit] If no, and given that there is nothing obvious to me that would kill the process, where can I find the evidence to diagnose the problem?
Thanks
Advertisement
Answer
Reasons for a JVM to be terminated are plentiful. It can terminate based on signals the owning user or root sends to it, it can also terminate based on the OOM killer (like you mentioned).
In several instances, I could trace random crashes back to bad/faulty RAM, which lead to memory corruption in the JVM, which in the end lead to the process terminating with SIGSEGV
. You could look if there are hs_err_pidXXXX.log
files. They might be missing if the user running the process doesn’t have permissions to write in the target directory. You can specify where they go using -XX:ErrorFile=/path/to/file.
Due to personal experience, in case of sporadic, untracable, unexplainable random crashes, the first thing I normally do is running memtest86
for a few hours. I tend to have a PXE bootable image of it in the network.
EDIT: Given that you are mentioning a virtual private server operated by another company, running memtest86 on the bare metal won’t be possible for you, but there is a user-space versions as well that might be worth trying.