Skip to content
Advertisement

nohup “does not work” MPIrun

I am trying to use the “nohup” command to avoid killing a background process when exiting the terminal on linux MATE.

The process I want to run is a MPIrun process and I use the following command:

JavaScript

when I leave the terminal, the processes running on the different cores are killed.

Also another thing I remarked in the log file, is that if I try to just run the following command

JavaScript

and then to CTRL+Z (stopping the process) the log file indicates :

JavaScript

and I am unable to actually stop the mpirun command. So I guess there is something I don’t understand in what I am doing

Advertisement

Answer

The job run in the background is still owned by your login shell (the nohup command doesn’t exit until the mpirun command terminates), so it gets signalled when you disconnect. This script (I call it bk) is what I use:

JavaScript

(If you’re into curiosities, note the careful use of $* and "$@". The nice runs the job at a lower priority when I’m not there. And version 1.1 was checked into version control — SCCS at the time — on 1987-08-10.)

For your process, you’d run:

JavaScript

The prompt returns almost immediately. The key differences between what is in that code and what you do direct from the command line are:

  1. There’s a sub-process for the shell script, which terminates promptly.
  2. The script itself runs the command in a sub-shell in background.

Between them, these mean that the process is not interfered with by your login shell; it doesn’t know about the grandchild process.

Running direct on the command line, you’d write:

JavaScript

The parentheses start a subshell; the sub-shell runs nohup in the background with I/O redirection and terminates. The continuing command is a grandchild of your login shell and is not interfered with by your login shell.

I’m not an expert in mpirun, never having used it, so there’s a chance it does something I’m not expecting. My impression from the manual page is that it acts more or less like a regular process even though it can run multiple other processes, possibly on multiple nodes. That is, it runs the other processes but monitors and coordinates them and only exits when its children are complete. If that’s correct, then what I’ve outlined is accurate enough.

Advertisement