High load average troubleshoot | Unix Linux Forums | UNIX for Advanced & Expert Users

  Go Back    


UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD.

High load average troubleshoot

UNIX for Advanced & Expert Users


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 10-29-2011
erick_tuk erick_tuk is offline
Registered User
 
Join Date: Oct 2010
Last Activity: 1 April 2012, 9:16 PM EDT
Posts: 23
Thanks: 6
Thanked 0 Times in 0 Posts
High load average troubleshoot

Hi all, hope you can help me. I'm getting high load average and can't find a reason for this, please share your inputs.

Code:
 load average: 7.78, 7.50, 7.31


Tasks: 330 total,   1 running, 329 sleeping,   0 stopped,   0 zombie
Cpu0  :  7.0%us,  1.0%sy,  0.0%ni, 23.9%id,  0.0%wa, 38.9%hi, 29.2%si,  0.0%st
Cpu1  :  2.0%us,  1.0%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 54.3%us,  4.7%sy,  0.0%ni, 40.0%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu3  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 55.3%us,  4.0%sy,  0.0%ni, 39.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu5  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 48.0%us,  5.3%sy,  0.0%ni, 45.4%id,  0.0%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu7  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  : 50.0%us,  4.3%sy,  0.0%ni, 44.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu9  :  1.0%us,  2.3%sy,  0.0%ni, 96.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 : 48.0%us,  3.0%sy,  0.0%ni, 48.0%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu11 :  0.7%us,  1.3%sy,  0.0%ni, 98.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 61.5%us,  7.6%sy,  0.0%ni, 29.9%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu13 :  1.0%us,  1.3%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 : 57.8%us,  6.0%sy,  0.0%ni, 35.2%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu15 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 : 54.3%us,  5.6%sy,  0.0%ni, 39.4%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 : 56.3%us,  4.3%sy,  0.0%ni, 38.7%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu19 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 : 51.7%us,  5.0%sy,  0.0%ni, 42.4%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu21 :  0.7%us,  1.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :  1.7%us,  0.7%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  0.7%us,  1.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  74167472k total, 35214936k used, 38952536k free,   788124k buffers
Swap: 33551744k total,        0k used, 33551744k free, 11540200k cached

 free -g
             total       used       free     shared    buffers     cached
Mem:            70         33         37          0          0         11
-/+ buffers/cache:         21         48
Swap:           31          0         31

 df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_root-lv_root
                      9.7G  1.4G  7.9G  16% /
/dev/mapper/vg_root-lv_tmp
                      2.0G  778M  1.1G  42% /tmp
/dev/mapper/vg_root-lv_var
                      992M  387M  555M  42% /var
/dev/mapper/vg_root-lv_log
                      2.0G  697M  1.2G  38% /var/log
/dev/mapper/vg_root-lv_crash
                       34G  177M   32G   1% /var/crash
/dev/mapper/vg_root-lv_vtmp
                      992M   34M  908M   4% /var/tmp
/dev/mapper/vg_root-lv_home
                      4.9G  263M  4.4G   6% /home
/dev/mapper/vg_root-lv_audit
                      2.0G   86M  1.8G   5% /var/log/audit
/dev/mapper/vg_root-lv_usr
                      4.9G  1.3G  3.4G  28% /usr
/dev/sda1             996M   53M  891M   6% /boot
tmpfs                  36G     0   36G   0% /dev/shm
/dev/mapper/vg_root-lv_opt
                      144G   11G  126G   8% /opt

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               7.72         1.35       152.93     786543   89194896
sda1              0.00         0.00         0.02       1634      14008
sda2              0.00         0.00         0.00       1377          0
sda3              7.72         1.34       152.91     783244   89180888
sdb               0.00         0.00         0.00       1504          0
dm-0              1.86         0.17        14.84     101418    8656072
dm-1              0.80         0.00         6.39       2794    3724232
dm-2              0.65         0.39         4.92     228586    2869568
dm-3              0.93         0.03         7.46      14666    4352368
dm-4              0.00         0.01         0.00       3490         88
dm-5              0.00         0.00         0.00       1154        392
dm-6              0.32         0.00         2.57       2810    1499640
dm-7              0.05         0.00         0.37       1714     218456
dm-8              1.24         0.46         9.62     269874    5612272
dm-9             13.36         0.27       106.73     156170   62247800

sudo netstat -pan | grep -c 'ESTABLISHED'
38494

sudo netstat -pan | grep -c 'TIME_WAIT'
10

sudo netstat -pan | grep -c 'LISTEN'
84

sudo netstat -pan | grep -c 'FIN_WAIT'
362

What else should I look for? Appreciate the help

Last edited by vgersh99; 10-29-2011 at 04:41 PM.. Reason: messed up code formating
Sponsored Links
    #2  
Old 10-29-2011
agama agama is offline Forum Advisor  
Always Learning
 
Join Date: Jul 2010
Last Activity: 7 April 2014, 3:02 PM EDT
Location: earth>US>UTC-5
Posts: 1,466
Thanks: 110
Thanked 506 Times in 485 Posts
You're not finding any smoking guns because there are none. A load average of 7.x on a machine with 8 or less cores would be high, and you'd probably see a different picture painted by top in terms of CPU utilisation or I/O wait on that class of machine. However, on a 24 core machine I don't believe your load average to be a concern.

For a 24 core machine, I wouldn't be concerned until your load average hits 70 to 75% of the number of cores -- 16 to 18 in your case. So here, 7.x isn't a concern.

NOTE: this is my perception of how load average should be interpreted and I might stand corrected.
The Following User Says Thank You to agama For This Useful Post:
erick_tuk (10-29-2011)
Sponsored Links
    #3  
Old 10-29-2011
erick_tuk erick_tuk is offline
Registered User
 
Join Date: Oct 2010
Last Activity: 1 April 2012, 9:16 PM EDT
Posts: 23
Thanks: 6
Thanked 0 Times in 0 Posts
Appreciate your input agama, I don't have access to this box right now, but as I recall that's the only box that shows such alert (using nagios), the rest of the boxes are all green, and all of them are 24 core machines, I should also mention the load is balanced among 8 boxes, so it's a bit weird this is the only one showing alerts.

Regards
    #4  
Old 10-29-2011
agama agama is offline Forum Advisor  
Always Learning
 
Join Date: Jul 2010
Last Activity: 7 April 2014, 3:02 PM EDT
Location: earth>US>UTC-5
Posts: 1,466
Thanks: 110
Thanked 506 Times in 485 Posts
I'm guessing that the alarm threshold coded in Nagios is set to alarm on a value without regard to number of cores. I'd have a look at the scripts and make adjustments such that the number of cores is taken into account.

I just peeked at one of our larger machines (255 cores) which is showing this load average:

Code:
  6:55pm  up 8 day(s),  2:12,  106 users,  load average: 148.22, 154.36, 153.55

Depending on the sophistication of the scheduler, it is very possible to end up with a machine that is more heavily loaded. It's also possible that the load is more evenly balanced than it appears from the Nagios alarms, but the other machines are just running under the threshold value.
The Following User Says Thank You to agama For This Useful Post:
erick_tuk (10-29-2011)
Sponsored Links
    #5  
Old 10-29-2011
erick_tuk erick_tuk is offline
Registered User
 
Join Date: Oct 2010
Last Activity: 1 April 2012, 9:16 PM EDT
Posts: 23
Thanks: 6
Thanked 0 Times in 0 Posts
Again. Thanks a lot!
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
apache high cpu load on high traffic awww Red Hat 4 10-01-2011 04:56 PM
Determining cause behind high load average proactiveaditya UNIX for Dummies Questions & Answers 3 01-06-2010 07:16 PM
Prstat - Average Value too high mpics66 Solaris 3 05-20-2009 03:08 AM
High cpu load average squid04 Red Hat 2 09-27-2006 08:07 AM
Sun: High kernel usage & very high load averages lorrainenineill UNIX for Advanced & Expert Users 4 02-06-2006 11:32 AM



All times are GMT -4. The time now is 11:52 AM.