I’m doing performance tuning of a crypto software, which is run on Linux and utilizes hardware crypto acceleration device. When the load is given over some threshold, kernel _spn_lock begin to eat most of the CPU’s time. The following perf top screenshot shows ~30% of CPU is taken by _spin_lock, but it goes up over 50% if a load is