Skip to content
Advertisement

Why multiprocessing.Pool and multiprocessing.Process perform so differently in Linux

I ran some test code as below to check the performance of using Pool and Process in Linux. I’m using Python 2.7. The source code of multiprocessing.Pool seems showing it’s using multiprocessing.Process. However, multiprocessing.Pool cost much time and mem than equal # of multiprocessing.Process, and I don’t get this.

Here is what I did:

  1. Create a large dict and then subprocesses.

  2. Pass the dict to each subprocess for read-only.

  3. Each subprocess do some computation and return a small result.

Below is the my testing code:

JavaScript

Here is the result:

JavaScript

I don’t know why subprocesses from multiprocessing.Pool need about 1.6GB in the beginning, but subprocess from multiprocessing.Process only needs 0.84 GBs which equals the memory cost of the main process. It seems to me that only multiprocessing.Process enjoys the “copy-on-write” benefit of linux, as the time for all jobs needed is less than 1s. I don’t know why multiprocessing.Pool does not enjoy this. From the source code, multiprocessing.Pool seems like a wrapper of multiprocessing.Process.

Advertisement

Answer

Question: I don’t know why subprocesses from multiprocessing.Pool need about 1.6GB in the beginning,
… Pool seems like a wrapper of multiprocessing.Process

This is, as Pool reserve memory for the results for all jobs.
Second, Pool uses two SimpleQueue() and three Threads.
Third, Pool duplicate all passed argv data before passing up to a process.

Your process example use only one Queue() for all, passing argv as they are.

Pool is far away to be only a wrapper.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement