I have a large number (>>100K) of tasks with very high latency (minutes) and very little resource consumption. Potentially they could all be executed in parallel and I was considering using std::async to generate one future for each task.
My question is: what is the maximum number of threads that std::async will create and execute asynchronously? (using g++ 6.x on Ubuntu 16-xx or CentOs 7.x – x86_64)
It is important for me to get that number right because if I do not have enough tasks actually running (waiting) in parallel the cumulative cost of latency will be very high.
To get to an answer, I started by checking the capabilities of the system:
bob@vb:~/programming/cxx/async$ ulimit -u 43735 bob@vb:~/programming/cxx/async$ cat /proc/sys/kernel/threads-max 87470
From these numbers, I was expecting to be able to get in the order of 43K threads running (mostly waiting) in parallel. To verify that, I wrote the program below to check the number of distinct thread ids and the time required to call 100K std::async with an empty task:
#include <thread>
#include <future>
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <string>
std::thread::id foo()
{
using namespace std::chrono_literals;
//std::this_thread::sleep_for(2s);
return std::this_thread::get_id();
}
int main(int argc, char **argv)
{
if (2 != argc) exit(1);
const size_t COUNT = std::stoi(argv[1]);
std::vector<decltype(std::async(foo))> futures;
futures.reserve(COUNT);
while (futures.capacity() != futures.size())
{
futures.push_back(std::async(foo));
}
std::vector<std::thread::id> ids;
ids.reserve(futures.size());
for (auto &f: futures)
{
ids.push_back(f.get());
}
std::sort(ids.begin(), ids.end());
const auto end = std::unique(ids.begin(), ids.end());
ids.erase(end, ids.end());
std:: cerr << "COUNT: " << COUNT << ": ids.size(): " << ids.size() << std::endl;
}
The time was fine but the number of distinct thread ids was much less than expected (32748 instead of 43735):
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 100000 COUNT: 100000: ids.size(): 32748 0:03.29
Then I un-commented the sleep line in foo to add a 2s sleeping time. The resulting timings are consistent with 2s up to 10K tasks or so, but at some point beyond that, some tasks end-up sharing the same thread id and the elapsed time increases by 2s for each additional task:
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10056 COUNT: 10056: ids.size(): 10056 0:02.24 bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10057 COUNT: 10057: ids.size(): 10057 0:04.27 bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10058 COUNT: 10058: ids.size(): 10057 0:06.28 bob@vb:~/programming/cxx/async$ ps -eT | wc -l 277
So, it looks that for my problem, on this system, the limit is in the order of 10K. I checked on another system and the limit was in the order of 4K.
I can’t figure out:
- why these values are so small
- how to predict these values from the specs of the system
Advertisement
Answer
With g++ on linux, the straightforward answer seems to be “the maximum number of threads that can be created before pthread_create fails and returns EAGAIN”. That number can be limited by several different values and man pthread_create lists 3 of them:
- RLIMIT_NPROC:soft resource limit (4096 on my CentOs 7 server and 43735 on my Ubuntu/VirtualBox laptop)
- the value of
/proc/sys/kernel/threads-max(2061857 and 87470 resp.) - the value of
/proc/sys/kernel/pid_max(40960 and 32768 resp.)
There is at least one other possible limit imposed by systemd, as man logind.conf indicates:
UserTasksMax= Sets the maximum number of OS tasks each user may run concurrently. This controls the TasksMax= setting of the per-user slice unit, see systemd.resource-control(5) for details. Defaults to 33%, which equals 10813 with the kernel’s defaults on the host, but might be smaller in OS containers.