Skip to content
Advertisement

Executing an external program when forking is not advisable

I have this a big server software that can hog 4-8GB of memory.

This makes fork-exec cumbersome, as the fork itself can take significant time, plus the default behavior seems to be that fork will fail unless there is enough memory for a copy of the entire resident memory.

Since this is starting to show as the hottest spot (60% of time spent in fork) when profiling I need to address it.

What would be the easiest way to avoid fork-exec routine?

Advertisement

Answer

You basically cannot avoid fork(2) (or the equivalent clone(2) syscall…, or the obsolete vfork which I don’t recommend using) + execve(2) to start an external command (à la system(3), or à la posix_spawn) on Linux and (probably) MacOSX and most other Unix or POSIX systems.

What makes you think that it is becoming an issue? And 8GB process virtual address space is not a big deal today (at least on machines with 8Gbytes, or 16Gbytes RAM, like my desktop has). You don’t practically need twice as much RAM (but you do need swap space) thanks to the lazy copy-on-write techniques used by all recent Unixes & Linux.

Perhaps you might believe that swap space could be an issue. On Linux, you could add swap space, perhaps by swapping on a file; just run as root:

 dd if=/dev/zero of=/var/tmp/myswap bs=1M count=32768
 mkswap /var/tmp/myswap
 swapon /var/tmp/myswap

(of course, be sure that /var/tmp/ is not a tmpfs mounted filesystem, but sits on some disk, perhaps an SSD one….)

When you don’t need any more a lot of swap space, run swapoff /var/tmp/myswap….

You could also start some external shell process near the beginning of your program (à la popen) and later you might send shell commands to it. Look at my execicar.c program for inspiration, or use it if it fits (I wrote it 10 years ago for similar purposes, but I forgot the details)

Alternatively fork at the beginning of your program some interpreter (Lua, Guile…) and send some commands to it.

Running more than a few dozens commands per second (starting any external program) is not reasonable, and should be considered as a design mistake, IMHO. Perhaps the commands that you are running could be replaced by in-process functions (e.g. /bin/ls can be done with stat, readdir, glob functions …). Perhaps you might consider adding some plugin ability (with dlopen(3) & dlsym) to your code (and run functions from plugins instead of starting very often the same programs). Or perhaps embed an interpreter (Lua, Guile, …) inside your code.

As an example, for web servers, look for old CGI vs FastCGI or HTTP forwarding (e.g. URL redirection) or embedded PHP or HOP or Ocsigen

Advertisement