I have a c++ solver which I need to run in parallel using the following command:
nohup mpirun -np 16 ./my_exec > log.txt &
This command will run my_exec
independently on the 16 processors available on my node. This used to work perfectly.
Last week, the HPC department performed an OS upgrade and now, when launching the same command, I get two warning messages (for each processor). The first one is:
-------------------------------------------------------------------------- 2 WARNING: It appears that your OpenFabrics subsystem is configured to only 3 allow registering part of your physical memory. This can cause MPI jobs to 4 run with erratic performance, hang, and/or crash. 5 6 This may be caused by your OpenFabrics vendor limiting the amount of 7 physical memory that can be registered. You should investigate the 8 relevant Linux kernel module parameters that control how much physical 9 memory can be registered, and increase them to allow registering all 10 physical memory on your machine. 11 12 See this Open MPI FAQ item for more information on these Linux kernel module 13 parameters: 14 15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages 16 17 Local host: tamnun 18 Registerable memory: 32768 MiB 19 Total memory: 98294 MiB 20 21 Your MPI job will continue, but may be behave poorly and/or hang. 22 -------------------------------------------------------------------------- 23 --------------------------------------------------------------------------
I then get an output from my code, which tells me it thinks I am launching only 1 realization of the code (Nprocs
= 1 instead of 16).
177 178 # MPI IS ON; Nprocs = 1 179 Filename = ../input/odtParam.inp 180 181 # MPI IS ON; Nprocs = 1 182 183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
Finally, the second warning message is:
185 -------------------------------------------------------------------------- 186 An MPI process has executed an operation involving a call to the 187 "fork()" system call to create a child process. Open MPI is currently 188 operating in a condition that could result in memory corruption or 189 other system errors; your MPI job may hang, crash, or produce silent 190 data corruption. The use of fork() (or system() or other calls that 191 create child processes) is strongly discouraged. 192 193 The process that invoked fork was: 194 195 Local host: tamnun (PID 17446) 196 MPI_COMM_WORLD rank: 0 197 198 If you are *absolutely sure* that your application will successfully 199 and correctly survive a call to fork(), you may disable this warning 200 by setting the mpi_warn_on_fork MCA parameter to 0. 201 --------------------------------------------------------------------------
After looking around online, I tried following the warning messages’ advice by setting the MCA
parameter mpi_warn_on_fork
to 0 with the command:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
which yielded the following error message:
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca [mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error [mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array [mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
I am using RedHat 6.7 (Santiago). I contacted the HPC department, but since I am in a university, this issue may take them a day or two to respond. Any help or guidance would be appreciated.
EDIT in response to answer:
Indeed, I was compiling my code with Open MPI’s mpic++
while running the executable with Intel’s mpirun
command, hence the error (after the OS upgrade Intel’s mpirun
was set as the default). I had to put the Open MPI’s mpirun
‘s path at the beginning of the $PATH
environmental variable.
The code now runs as expected BUT I still get the first warning message above (it does not advise me to use the MCA
parameter mpi_warn_on_fork
anymore. I think (but not sure) it is an issue I need to resolve with the HPC department.
Advertisement
Answer
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca [mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error [mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array ^^^^^ [mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments ^^^^^
You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca
parameter that is specific to Open MPI (MCA stands for Modular Component Architecture – the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.