Skip to content
Advertisement

Debugging Linux process hangs, which code is it running?

I have a process running on a very weak Linux embedded device, which could not run gdb / gdb server on itself. I let it provoking a function X from a shared library repeatedly (there are also some others process calling it at the same time with much less frequency), it usually hangs somewhere inside the shared library after 1 day or a half-day. How do I debug:

  • In case it blocked somewhere: which is the last line of code it ran?
  • In case it stuck in an infinite loop: which lines of code it running?

What I tried:

  • I dig into the shared library and put a lot of syslog inside to check. However, with a very high amount of syslog being called constantly, my process now hangs every 2-5 minutes. I think syslog is blocked by UNIX socket?

Advertisement

Answer

gdb comes with a program called gcore, which will generate a core file from the running process.

Many systems nowadays disable core files by default (ulimit -c in a shell will show 0). Use the ulimit -c unlimited shell command, then run your process in the same shell (these limits are inherited from the parent process. If you start your process some other way than directly from the shell, you will need to find out how to set them there. e.g., LimitCORE= in a systemd unit file).

Once your process gets into the bad state, run gcore on its process ID. You can then copy it to your workstation and load it into gdb (gdb <executable> <core-file>). You can then view the stack trace and other state as of the moment the core dump was taken.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement