I running my program as daemon.
Father process only wait for child process, when it is dead unexpected, fork and wait again.
for (; 1;) { if (fork() == 0) break; int sig = 0; for (; 1; usleep(10000)) { pid_t wpid = waitpid(g->pid[1], &sig, WNOHANG); if (wpid > 0) break; if (wpid < 0) print("wait error: %sn", strerror(errno)); } }
But when child process being killed with -9 signal, the child process goes to zombie process.
waitpid
should return the pid of child process immediately!
But waitpid
got the pid number after about 90 seconds,
cube 28139 0.0 0.0 70576 900 ? Ss 04:24 0:07 ./daemon -d cube 28140 9.3 0.0 0 0 ? Zl 04:24 106:19 [daemon] <defunct>
Here is the strace of the father
The father does not get stuck, wait4 was called always.
strace -p 28139 Process 28139 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0 nanosleep({0, 10000000}, NULL) = 0 wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
About 90 seconds later father got the SIGCHILD and wait4 returned the pid of the dead child.
--- SIGCHLD (Child exited) @ 0 (0) --- restart_syscall(<... resuming interrupted call ...>) = 0 wait4(28140, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 28140
Why the child process does not exit immediately? On the contrary, it turns into zombie unexpectedly.
Advertisement
Answer
I finally find out there were some fd leaks during deep tracing by lsof.
After fd leaks were fixed, the problem was gone.