I have a large number (>>100K) of tasks with very high latency (minutes) and very little resource consumption. Potentially they could all be executed in parallel and I was considering using std::async
to generate one future for each task.
My question is: what is the maximum number of threads that std::async will create and execute asynchronously? (using g++ 6.x on Ubuntu 16-xx or CentOs 7.x – x86_64)
It is important for me to get that number right because if I do not have enough tasks actually running (waiting) in parallel the cumulative cost of latency will be very high.
To get to an answer, I started by checking the capabilities of the system:
bob@vb:~/programming/cxx/async$ ulimit -u 43735 bob@vb:~/programming/cxx/async$ cat /proc/sys/kernel/threads-max 87470
From these numbers, I was expecting to be able to get in the order of 43K threads running (mostly waiting) in parallel. To verify that, I wrote the program below to check the number of distinct thread ids and the time required to call 100K std::async
with an empty task:
#include <thread> #include <future> #include <iostream> #include <vector> #include <algorithm> #include <chrono> #include <string> std::thread::id foo() { using namespace std::chrono_literals; //std::this_thread::sleep_for(2s); return std::this_thread::get_id(); } int main(int argc, char **argv) { if (2 != argc) exit(1); const size_t COUNT = std::stoi(argv[1]); std::vector<decltype(std::async(foo))> futures; futures.reserve(COUNT); while (futures.capacity() != futures.size()) { futures.push_back(std::async(foo)); } std::vector<std::thread::id> ids; ids.reserve(futures.size()); for (auto &f: futures) { ids.push_back(f.get()); } std::sort(ids.begin(), ids.end()); const auto end = std::unique(ids.begin(), ids.end()); ids.erase(end, ids.end()); std:: cerr << "COUNT: " << COUNT << ": ids.size(): " << ids.size() << std::endl; }
The time was fine but the number of distinct thread ids was much less than expected (32748 instead of 43735):
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 100000 COUNT: 100000: ids.size(): 32748 0:03.29
Then I un-commented the sleep line in foo
to add a 2s sleeping time. The resulting timings are consistent with 2s up to 10K tasks or so, but at some point beyond that, some tasks end-up sharing the same thread id and the elapsed time increases by 2s for each additional task:
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10056 COUNT: 10056: ids.size(): 10056 0:02.24 bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10057 COUNT: 10057: ids.size(): 10057 0:04.27 bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10058 COUNT: 10058: ids.size(): 10057 0:06.28 bob@vb:~/programming/cxx/async$ ps -eT | wc -l 277
So, it looks that for my problem, on this system, the limit is in the order of 10K. I checked on another system and the limit was in the order of 4K.
I can’t figure out:
- why these values are so small
- how to predict these values from the specs of the system
Advertisement
Answer
With g++ on linux, the straightforward answer seems to be “the maximum number of threads that can be created before pthread_create fails and returns EAGAIN”. That number can be limited by several different values and man pthread_create
lists 3 of them:
- RLIMIT_NPROC:soft resource limit (4096 on my CentOs 7 server and 43735 on my Ubuntu/VirtualBox laptop)
- the value of
/proc/sys/kernel/threads-max
(2061857 and 87470 resp.) - the value of
/proc/sys/kernel/pid_max
(40960 and 32768 resp.)
There is at least one other possible limit imposed by systemd
, as man logind.conf
indicates:
UserTasksMax= Sets the maximum number of OS tasks each user may run concurrently. This controls the TasksMax= setting of the per-user slice unit, see systemd.resource-control(5) for details. Defaults to 33%, which equals 10813 with the kernel’s defaults on the host, but might be smaller in OS containers.