I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well. This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given
Tag: performance
Reason for collapse of memory bandwidth when 2KB of data is cached in L1-cache
In a self-educational project I measure the bandwidth of the memory with help of the following code (here paraphrased, the whole code follows at the end of the question): BLOCK_SIZE is choosen in such a way, that a whole 64byte cache line is fetched per single integer-addition. My machine (an Intel-Broadwell) needs about 0.35 nanosecond per integer-addion, so the code
Linux Hadoop Services monitoring tool and restart if down
I have configured Hadoop 2.7.5 with Hbase. It is a 5 system cluster in fully distributed mode. I have to monitor Hadoop/Hbase daemons and want to start some action (e.g. mail ) if some daemon goes down. Is there any built-in solution. Also I want to start Hadoop at boot time. How can I do this ? Answer I am
How to find out which kernel spinlock eat up most of CPU?
I’m doing performance tuning of a crypto software, which is run on Linux and utilizes hardware crypto acceleration device. When the load is given over some threshold, kernel _spn_lock begin to eat most of the CPU’s time. The following perf top screenshot shows ~30% of CPU is taken by _spin_lock, but it goes up over 50% if a load is
JAX WS Server implementation performance issue for Linux JVM?
I’ve faced with a very weird problem. The built-in JAX WS server implementation works 100 times slower on linux machines then on Mac OS X or Windows. I’ve created and shared a JMH test: https://github.com/Andremoniy/linuxjvmjaxwstest Basically it does the following: starts a JAX WS with one SOAP method: endpoint = Endpoint.publish(“http://localhost:8888/”, new FooServiceImpl()); performs client requests to this method: String
How can I measure CPU time of a specific set of threads?
I run C++ program in Linux. There are several threads pool (for computation, for io, for … such things). The system call clock() gives me a way to measure the CPU time spent by all the CPU cores for the process. However, I want to measure the CPU time spent only by the threads in the computation threads pool. How
is rmem_default size per socket or for entire stack?
Does setting the net.core.rmem_default effect each socket or all sockets opened in the system? What is the maximum value I can configure for the net.core.rmem_default parameter? I understand it depends on RAM. Assume I have much RAM available. Answer net.core.rmem_default is the size of the incoming kernel socket buffer per one socket. From man socket(7): SO_RCVBUF Sets or gets the
Random mmaped memory access up to 16% slower than heap data access
Our software builds a data structure in memory that is about 80 gigabytes large. It can then either use this data structure directly to do its computation, or dump it to disk so it can be reused several times afterwards. A lot of random memory accesses happens in this data structure. For larger input this data structure can grow even
How does perf associate events to functions?
More precisely how does the perf tool associate PMU events to functions i already realized that when the kernel perf subsystem records the event counters it also records the Program Counter (PC) so it can associate the count to a function. However to really get fine grain result, you need to sample the counters in a very high rate, otherwise
Which perf events can use PEBS?
I want to understand which events can have the precise modifier on my CPU (Sandy Bridge). Intel Software Developer’s Manual (Table 18-32. PEBS Performance Events for Intel Microarchitecture Code Name Sandy Bridge) contains only the following events: INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED, MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED. And SandyBridge_core_V15.json lists the same events with PEBS > 0. However there are some examples of