I’ve been attempting to setup a dual-cpu workstation (Dell Precision 7820) to run local parallel jobs utilizing openmpi 2.1.1-8 (as preinstalled on Ubuntu 18.04) however it fails to run with the following error:
mpirun: pci-common.c:125: hwloc_pci_compare_busids: Assertion `0' failed.
Examining the source code of pci-common.c you can find a comment before the assert(0) line that states nothing should normally reach this point and will abort all debug and non-debug builds. Attempting to generate a system topology map via lstopo (a program within hwloc) also fails with a similar error.
I was able to locally compile a newer release of hwloc (2.0.4 compared to the preinstalled 1.11.9-1) and found that I was only able to get lstopo to generate a topology map when I compiled hwloc using libpciaccess-dev over the standard libpciaccess0 that comes preinstalled. The summary output from making hwloc with the different pciaccess libraries displays the following results
Probe / display I/O devices: PCI(linux) LinuxIO GL Probe / display I/O devices: PCI(pciaccess+linux) LinuxIO GL
with the former being compiled with libpciaccess0 and the latter being compiled with libpciaccess-dev. Again, the latter is the only one capable of generating a system topology map and I’m under the impression openmpi needs this information to properly scatter jobs on the system. I’m currently unsure how to enforce these version changes to the current openmpi package or if things need to compiled entirely from source. Is there potentially a simpler way to approach this problem?
Advertisement
Answer
Problem was solved through trial and error. First, purge installation of openmpi from the system (if installed via apt) by:
sudo apt purge openmpi-bin sudo apt purge openmpi-common
Then, download hwloc 1.11.13 (ultrastable) from https://www.open-mpi.org/software/hwloc/v1.11/ and extract to a local directory. Enter the hwloc directory and on the command line enter:
./configure make sudo make install
After this is completed, install libhwloc5 then openmpi from apt:
sudo apt-get install libhwloc5 sudo apt-get install openmpi-bin sudo apt-get install openmpi-common
Open-MPI should run as intended now and you should be able to generate system topology by running ‘lstopo’ and ensure mpi is working by running ‘mpirun’ without errors.
Hope this helps anyone who has a similar issue in the future!