High load average troubleshoot


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users High load average troubleshoot
# 1  
Old 10-29-2011
High load average troubleshoot

Hi all, hope you can help me. I'm getting high load average and can't find a reason for this, please share your inputs.
Code:
 load average: 7.78, 7.50, 7.31


Tasks: 330 total,   1 running, 329 sleeping,   0 stopped,   0 zombie
Cpu0  :  7.0%us,  1.0%sy,  0.0%ni, 23.9%id,  0.0%wa, 38.9%hi, 29.2%si,  0.0%st
Cpu1  :  2.0%us,  1.0%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 54.3%us,  4.7%sy,  0.0%ni, 40.0%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu3  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 55.3%us,  4.0%sy,  0.0%ni, 39.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu5  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 48.0%us,  5.3%sy,  0.0%ni, 45.4%id,  0.0%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu7  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  : 50.0%us,  4.3%sy,  0.0%ni, 44.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu9  :  1.0%us,  2.3%sy,  0.0%ni, 96.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 : 48.0%us,  3.0%sy,  0.0%ni, 48.0%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu11 :  0.7%us,  1.3%sy,  0.0%ni, 98.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 61.5%us,  7.6%sy,  0.0%ni, 29.9%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu13 :  1.0%us,  1.3%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 : 57.8%us,  6.0%sy,  0.0%ni, 35.2%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu15 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 : 54.3%us,  5.6%sy,  0.0%ni, 39.4%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 : 56.3%us,  4.3%sy,  0.0%ni, 38.7%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu19 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 : 51.7%us,  5.0%sy,  0.0%ni, 42.4%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu21 :  0.7%us,  1.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :  1.7%us,  0.7%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  0.7%us,  1.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  74167472k total, 35214936k used, 38952536k free,   788124k buffers
Swap: 33551744k total,        0k used, 33551744k free, 11540200k cached

 free -g
             total       used       free     shared    buffers     cached
Mem:            70         33         37          0          0         11
-/+ buffers/cache:         21         48
Swap:           31          0         31

 df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_root-lv_root
                      9.7G  1.4G  7.9G  16% /
/dev/mapper/vg_root-lv_tmp
                      2.0G  778M  1.1G  42% /tmp
/dev/mapper/vg_root-lv_var
                      992M  387M  555M  42% /var
/dev/mapper/vg_root-lv_log
                      2.0G  697M  1.2G  38% /var/log
/dev/mapper/vg_root-lv_crash
                       34G  177M   32G   1% /var/crash
/dev/mapper/vg_root-lv_vtmp
                      992M   34M  908M   4% /var/tmp
/dev/mapper/vg_root-lv_home
                      4.9G  263M  4.4G   6% /home
/dev/mapper/vg_root-lv_audit
                      2.0G   86M  1.8G   5% /var/log/audit
/dev/mapper/vg_root-lv_usr
                      4.9G  1.3G  3.4G  28% /usr
/dev/sda1             996M   53M  891M   6% /boot
tmpfs                  36G     0   36G   0% /dev/shm
/dev/mapper/vg_root-lv_opt
                      144G   11G  126G   8% /opt

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               7.72         1.35       152.93     786543   89194896
sda1              0.00         0.00         0.02       1634      14008
sda2              0.00         0.00         0.00       1377          0
sda3              7.72         1.34       152.91     783244   89180888
sdb               0.00         0.00         0.00       1504          0
dm-0              1.86         0.17        14.84     101418    8656072
dm-1              0.80         0.00         6.39       2794    3724232
dm-2              0.65         0.39         4.92     228586    2869568
dm-3              0.93         0.03         7.46      14666    4352368
dm-4              0.00         0.01         0.00       3490         88
dm-5              0.00         0.00         0.00       1154        392
dm-6              0.32         0.00         2.57       2810    1499640
dm-7              0.05         0.00         0.37       1714     218456
dm-8              1.24         0.46         9.62     269874    5612272
dm-9             13.36         0.27       106.73     156170   62247800

sudo netstat -pan | grep -c 'ESTABLISHED'
38494

sudo netstat -pan | grep -c 'TIME_WAIT'
10

sudo netstat -pan | grep -c 'LISTEN'
84

sudo netstat -pan | grep -c 'FIN_WAIT'
362

What else should I look for? Appreciate the help

Last edited by vgersh99; 10-29-2011 at 05:41 PM.. Reason: messed up code formating
# 2  
Old 10-29-2011
You're not finding any smoking guns because there are none. A load average of 7.x on a machine with 8 or less cores would be high, and you'd probably see a different picture painted by top in terms of CPU utilisation or I/O wait on that class of machine. However, on a 24 core machine I don't believe your load average to be a concern.

For a 24 core machine, I wouldn't be concerned until your load average hits 70 to 75% of the number of cores -- 16 to 18 in your case. So here, 7.x isn't a concern.

NOTE: this is my perception of how load average should be interpreted and I might stand corrected.
This User Gave Thanks to agama For This Post:
# 3  
Old 10-29-2011
Appreciate your input agama, I don't have access to this box right now, but as I recall that's the only box that shows such alert (using nagios), the rest of the boxes are all green, and all of them are 24 core machines, I should also mention the load is balanced among 8 boxes, so it's a bit weird this is the only one showing alerts.

Regards
# 4  
Old 10-29-2011
I'm guessing that the alarm threshold coded in Nagios is set to alarm on a value without regard to number of cores. I'd have a look at the scripts and make adjustments such that the number of cores is taken into account.

I just peeked at one of our larger machines (255 cores) which is showing this load average:
Code:
  6:55pm  up 8 day(s),  2:12,  106 users,  load average: 148.22, 154.36, 153.55

Depending on the sophistication of the scheduler, it is very possible to end up with a machine that is more heavily loaded. It's also possible that the load is more evenly balanced than it appears from the Nagios alarms, but the other machines are just running under the threshold value.
This User Gave Thanks to agama For This Post:
# 5  
Old 10-29-2011
Again. Thanks a lot! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

High Load average | vmstat hints what ?

TOP: top - 17:09:39 up 47 days, 1:34, 13 users, load average: 6.54, 10.96, 11.27 Tasks: 274 total, 3 running, 271 sleeping, 0 stopped, 0 zombie Cpu0 : 6.0%us, 44.9%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu1 : 6.3%us, 44.4%sy, 0.0%ni, 48.0%id, 0.3%wa, ... (2 Replies)
Discussion started by: stunn3r
2 Replies

2. UNIX for Advanced & Expert Users

High load average in idle state

With linux kernel 2.4.22-1.2199.nptlsmp (I know, it's very old) Sometimes Load average increases to big value (over 7) but my 4 vCPU are in idle state (5% busy every cpu). My web procedure was gone down so I found out that process (with 4732 process id, see my following output) was in... (4 Replies)
Discussion started by: zio_mangrovia
4 Replies

3. Red Hat

apache high cpu load on high traffic

i have a Intel Quad Core Xeon X3440 (4 x 2.53GHz, 8MB Cache, Hyper Threaded) with 16gig and 1tb harddrive with a 1gb port and my apache is causing my cpu to go up to 100% on all four cores heres my http.config <IfModule prefork.c> StartServers 10 MinSpareServers 10 MaxSpareServers 15... (4 Replies)
Discussion started by: awww
4 Replies

4. UNIX for Dummies Questions & Answers

Help with load average?

how load average is calculated and what exactly is it difference between cpu% and load average (9 Replies)
Discussion started by: robo
9 Replies

5. UNIX for Dummies Questions & Answers

Determining cause behind high load average

How to determine what is causing high load average in a system? (3 Replies)
Discussion started by: proactiveaditya
3 Replies

6. Solaris

Prstat - Average Value too high

Hi All, Please see to the prstat o/p of one of my sun box.. Total: 1 processes, 68 lwps, load averages: 531.00, 305.18, 144.77 Check the pstack .... As i have read in all docs , people say a value of 5 is considered high CPU usage , i don't know then how we can even relate those... (3 Replies)
Discussion started by: mpics66
3 Replies

7. UNIX for Dummies Questions & Answers

Please Help me in my load average

Hello AlL,.. I want from experts to help me as my load average is increased and i dont know where is the problem !! this is my top result : root@a4s # top top - 11:30:38 up 40 min, 1 user, load average: 3.06, 2.49, 4.66 Mem: 8168788k total, 2889596k used, 5279192k free, 47792k... (3 Replies)
Discussion started by: black-code
3 Replies

8. Red Hat

High cpu load average

Hi Buddies, Thanx for reading my first post... After googling a lot and searching so many forums I am feeling down a bit... Please don't mind my ignorence, and my grammer ... :) My server is running RHEL 2.6.9-5.EL. The cpu load is going higher than roof, almost 100 sometimes. I am... (2 Replies)
Discussion started by: squid04
2 Replies

9. UNIX for Advanced & Expert Users

load average

we have an unix system which has load average normally about 20. but while i am running a particular unix batch which performs heavy operations on filesystem and database average load reduces to 15. how can we explain this situation? while running that batch idle cpu time is about %60-65... (0 Replies)
Discussion started by: gfhgfnhhn
0 Replies

10. UNIX for Advanced & Expert Users

Sun: High kernel usage & very high load averages

Hi, I am seeing very high kernel usage and very high load averages on my system (Although we are not loading much data to our database). Here is the output of top...does anyone know what i should be looking at? Thanks, Lorraine last pid: 13144; load averages: 22.32, 19.81, 16.78 ... (4 Replies)
Discussion started by: lorrainenineill
4 Replies
Login or Register to Ask a Question