Trying to understand what’s going on with my server. It’s a 2 cpu server, so:
$> grep 'model name' /proc/cpuinfo | wc -l 2
While on load avergae, queue is showing ~8 :
$> uptime 16:31:30 up 123 days, 9:04, 1 user, load average: 8.37, 8.48, 8.55
So You can assume, load is really high and things are pailing up, there is some load on the system and it’s not just a spike. However, Looking at top cpu consumers:
> ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6 %CPU PID USER COMMAND 8.3 27187 **** server_process_c 1.0 22248 **** server_process_b 0.5 22282 **** server_process_a 0.0 31167 root head -6 0.0 31166 root sort -k 1 -r 0.0 31165 root ps -eo pcpu,pid,user,args
Results of free command:
total used free shared buffers cached Mem: 7986 7934 52 0 9 2446 -/+ buffers/cache: 5478 2508 Swap: 17407 60 17347 This is the result on an ongoing basis, e.g. not even
a single CPU is being used, top consumer, is always ~8.5%.
My Question: What are my ways to track down the root of the high load?
Advertisement
Answer
Based on your free
output, there are times when system memory is exhausted so swap buffer is used (see column used
= 60). Total memory used used - (buffers + cached)
which result almost zero. It means there are time when all physical RAM is consumed.
For server, try to avoid page fault which may cause swapping data from system memory to swap buffer (or vice versa) as much as possible because accessing hard drive is very slow than system RAM.
In your top
output, try to investigate wa
column. Higher percentage value means CPU spend more times waiting for data IO from disk rather than doing meaningful computation.
Cpu(s): 87.3%us, 1.2%sy, 0.0%ni, 27.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Try to reduce daemon or service that you do not need to reduce memory footprint and consider to add more RAM to the system.
For 2 CPU(s) server, ideal load is less than 2.0 (each CPU load is less than 1.0). Load of 8.0 means each CPU load is roughly 4.0 which is not very good.