Skip to content
Advertisement

nohup “does not work” MPIrun

I am trying to use the “nohup” command to avoid killing a background process when exiting the terminal on linux MATE.

The process I want to run is a MPIrun process and I use the following command:

nohup mpirun -np 8 solverName -parallel >log 2>&1

when I leave the terminal, the processes running on the different cores are killed.

Also another thing I remarked in the log file, is that if I try to just run the following command

mpirun -np 8 solverName -parallel >log 2>&1

and then to CTRL+Z (stopping the process) the log file indicates :

Forwarding signal 20 to job

and I am unable to actually stop the mpirun command. So I guess there is something I don’t understand in what I am doing

Advertisement

Answer

The job run in the background is still owned by your login shell (the nohup command doesn’t exit until the mpirun command terminates), so it gets signalled when you disconnect. This script (I call it bk) is what I use:

#!/bin/sh
#
# @(#)$Id: bk.sh,v 1.9 2008/06/25 16:43:25 jleffler Exp $"
#
# Run process in background
# Immune from logoffs -- output to file log

(
echo "Date: `date`"
echo "Command: $*"
nice nohup "$@"
echo "Completed: `date`"
echo
) >>${LOGFILE:=log} 2>&1 &

(If you’re into curiosities, note the careful use of $* and "$@". The nice runs the job at a lower priority when I’m not there. And version 1.1 was checked into version control — SCCS at the time — on 1987-08-10.)

For your process, you’d run:

$ bk mpirun -np 8 solverName -parallel
$

The prompt returns almost immediately. The key differences between what is in that code and what you do direct from the command line are:

  1. There’s a sub-process for the shell script, which terminates promptly.
  2. The script itself runs the command in a sub-shell in background.

Between them, these mean that the process is not interfered with by your login shell; it doesn’t know about the grandchild process.

Running direct on the command line, you’d write:

(nohup mpirun -np 8 solverName -parallel >log 2>&1 &)

The parentheses start a subshell; the sub-shell runs nohup in the background with I/O redirection and terminates. The continuing command is a grandchild of your login shell and is not interfered with by your login shell.

I’m not an expert in mpirun, never having used it, so there’s a chance it does something I’m not expecting. My impression from the manual page is that it acts more or less like a regular process even though it can run multiple other processes, possibly on multiple nodes. That is, it runs the other processes but monitors and coordinates them and only exits when its children are complete. If that’s correct, then what I’ve outlined is accurate enough.

Advertisement