I have a 12 core Linux cpu but the load is really high on this box, hovering around 50.

07-31-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

OK, that answers one of rbatte1's questions... You're not swapping or low on memory. That's good.

If you won't show the rest of top, how about the ps listing? ps aux for lots of detail on memory and load etc.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-31-2014

Registered User

24, 0

Join Date: Mar 2014

Last Activity: 15 September 2015, 4:58 PM EDT

Posts: 24

Thanks Given: 3

Thanked 0 Times in 0 Posts

I hope this will help

Code:

top - 14:06:47 up 7 days	 11:46	  0 users	  load average: 46.05	43.68	32.2		
Tasks: 330 total	  20 running	 310 sleeping	   0 stopped	   0 zombie			
Cpu(s): 92.1%us	  6.6%sy	  0.0%ni	  0.2%id	  0.8%wa	  0.0%hi	  0.3%si	  0.0%st
Mem:  12164276k total	 12063308k used	   100968k free	    61984k buffers				
Swap:  2097144k total	      244k used	  2096900k free	 10139736k cached				
							
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND							
 2715 root      19   0  178m  79m 8068 R 121.2  0.7   0:27.66 zoombie							
 4155 root      17   0  166m  77m 8060 R 114.0  0.7   0:13.25 zoombie							
 4881 root      17   0  162m  74m 8064 R 109.4  0.6   0:08.75 zoombie							
26680 root      16   0  174m  52m 8116 R 90.3  0.4   0:59.79 zoombie							
26555 root      25   0  186m  98m 8264 R 83.7  0.8   0:29.13 zoombie							
12394 root      18   0  198m  50m 8224 R 75.8  0.4   2:08.08 zoombie							
 6877 root      15   0 2372m 440m  21m S 74.1  3.7   6558:06 zoombie							
 2174 root      25   0  364m  34m  13m R 70.2  0.3   0:12.50 zoombie							
31227 root      24   0  162m  56m 8144 R 64.9  0.5   0:24.82 zoombie							
16611 root      17   0  274m 147m 8244 R 64.6  1.2   0:48.05 zoombie							
26566 root      25   0  170m  60m 8268 R 61.9  0.5   0:30.78 zoombie							
30700 root      25   0  160m  53m 8084 R 60.3  0.5   0:21.50 zoombie							
 2508 root      25   0  194m  57m 8124 S 51.1  0.5   0:20.70 zoombie							
 2238 root      25   0  368m  28m 8240 R 49.7  0.2   0:11.46 zoombie							
 5368 root      18   0  350m  21m 7684 R 35.2  0.2   0:01.07 zoombie							
 5389 root      19   0 92712  24m 7960 R 14.5  0.2   0:00.44 zoombie							
30104 root      25   0  408m  27m 8256 R 10.2  0.2   0:25.06 zoombie							
 5401 root      19   0 72216  19m 7424 R  8.9  0.2   0:00.27 zoombie							
 4336 root      18   0 80352  22m 6700 S  8.2  0.2   0:01.32 zoombie							
 4115 root      18   0  106m  47m 6708 S  2.0  0.4   0:01.65 zoombie							
  553 root      10  -5     0    0    0 S  1.3  0.0 101:29.65 kswapd0							
 3919 dbpmt1    15   0 11016 1276  796 R  0.7  0.0   0:00.06 top							
 6893 root      16   0 57820 2876 1136 S  0.7  0.0  14:00.23 zoombie							
  552 root      15   0     0    0    0 S  0.3  0.0   9:21.33 pdflush							
 3643 root      10  -5     0    0    0 S  0.3  0.0  23:41.94 jbd2/dm-9-8							
 8353 root      15   0  103m 1420  688 S  0.3  0.0   1:51.97 hpasmlited							
    1 root      15   0 10372  708  596 S  0.0  0.0   0:12.26 init							
    2 root      RT  -5     0    0    0 S  0.0  0.0   1:16.91 anonymous/0							
    3 root      34  19     0    0    0 S  0.0  0.0   0:10.49 ksoftirqd/0							
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0							
    5 root      RT  -5     0    0    0 S  0.0  0.0   1:44.20 anonymous/1							
    6 root      34  19     0    0    0 S  0.0  0.0   0:06.19 ksoftirqd/1							
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1							
    8 root      RT  -5     0    0    0 S  0.0  0.0   1:26.14 anonymous/2							
    9 root      34  19     0    0    0 S  0.0  0.0   0:06.37 ksoftirqd/2							
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2							
   11 root      RT  -5     0    0    0 S  0.0  0.0   1:05.80 anonymous/3

Moon1234

View Public Profile for Moon1234

Find all posts by Moon1234

07-31-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Well, you've got 20 instances of this "zoombie" thing running, whatever it is. Each instance appears to be threaded since some are using more than one CPU at once -- see the ones going over 100%. If they all decided to get busy at once, you certainly would get a very high load average.

As for why they'd do this, that depends on what the application's doing at any particular moment. It could be processing lots of queries, or it could be bogged down and accepting queries faster than it can process them...

One of them seems to have been running for nearly a year. Is this normal?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-31-2014

Registered User

24, 0

Join Date: Mar 2014

Last Activity: 15 September 2015, 4:58 PM EDT

Posts: 24

Thanks Given: 3

Thanked 0 Times in 0 Posts

Yes, you are correct it is processing lots of queries every moment. What do you think is the best possible solution to solve this problem. And how cam you say that each instance are using more than one cpu at a time....

Thanks for your help!!!

Moon1234

View Public Profile for Moon1234

Find all posts by Moon1234

07-31-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

A single-thread process can't use more than one core -- 100% -- so anything above 100% must have threads. All of them might have threads, just not as busy right now.

I must make this clear: We know nothing of this "zoombie" application. (Searching for it on the net just finds a game for android.) For all I know, this problem might be a bug or misconfiguration, fixed without resorting to anything I suggest next. You should really consult your vendor about this issue before you invest too much time or money.

That said, I see two paths.

1) Configure zoombie to run less threads. People will need to start waiting their turn. How to do this depends on the application in question.

If your disks and memory can't keep up with the load, it doesn't matter how many cores you have. 50 idle cores all waiting for the same disk (which might actually be counted in the load average on Linux) aren't any faster than 30 idle cores.

2) Toughen your system to tolerate such a high load.

A load average of 50 on a 12-core system isn't as horrible as it sounds... It's not great, but it's no worse than 4 on a single-core. Lots of systems (especially leased ones -- getting the most for you money) tolerate quite high loads nonstop.

They deal with it by having lots of memory, beefing up the disks, nice-ing the hungriest processes (so you get priority when you need to administrate), putting swap on a separate hard-disk, and other such things -- this is standard configuration for most large UNIX systems, but not so much for consumer Linux unless they really planned ahead. This reduces the punishment various bottlenecks will cause, so it slows down more or less proportionately instead of bogging down.

You still need to control how fast queries are allowed to happen, though. The total amount of work done per second doesn't improve without more cores, so the more queries you accept, the slower they'll be completed. If your server allows itself to accept more queries per second than it can complete in a second, it's doomed.

Questions for you:

I'd like to see top and ps and iotop sometime your load average is really, really high. If you can't catch it in the act, you might need something like sar to record it.

Last edited by Corona688; 07-31-2014 at 04:53 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

09-11-2014

Registered User

6, 0

Join Date: Sep 2014

Last Activity: 15 September 2014, 11:44 AM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Looks like you are running at 98% memory, but not swapping. There isn't much swap used. However you may not be getting very good file cache hits and it could be adding a bit to the load.

I hada experience with a search indexer that had 10+% improvement in performance when all the memory slots were filled v.s the same amount of memory larger dimms.

1st you need to identify your bottleneck. Memory or Disk is likely with indexers.

You get most of what you need our of sar, but you need to "tweek" sar to do disks. There is a nice site to graph it for you, but I can't post URLs.

Code:

sar -Ap

Change your sar to report every 5 minutes, instead of minutes. 10 minutes is not enough and 1 minute can be too much.

Disk I/O - to look at disk I/O, system time and I/O wait, to start to see if it a issue. You can "tweek" sar to show disks usage and it may help identify bottlenecks.
Network - look at for errors on interfaces and over bandwidth limits. Make sure you are running full deplex and the max speed you expect. Look out for things like time_waits in
Code:
```
netstat -antp
```
. There are a number or kernel tuning parms.
Memory - look at utilization and swapping. In this time, there isn't much excuse to have critical application swapping. Be carefull how you read sar, free, ... they report file cache which is tempory memory use to improve performace and is a cache so it may not be used actively. There is some tuning that can be done, depending on the application behavior and memory. Like large pages, but you need to know the behavior and the what else you need to adjust.
CPU - look at user vs system time and I/O wait.
Load - wil be shown in sar, but needs to be idenitied. top has a 'H' option to toggle between showing threads and not show them. It could be the application isn't tuned. This is a overall indication of everything

It may just be the application needs tuning. Threads, size, Garbage Collection (GC), ... If java, watch the the GCs and look at tuning it.

netnerd

View Public Profile for netnerd

Find all posts by netnerd

09-11-2014

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

Thread is about a month and a half old.

achenle

View Public Profile for achenle

Find all posts by achenle

UNIX for Dummies Questions & Answers

I have a 12 core Linux cpu but the load is really high on this box, hovering around 50.

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Restart debian server if one specific process has more than 10 seconds have high cpu load

Discussion started by: woisch

2. Shell Programming and Scripting

what would a script include to find CPU's %system time high and user time high?

Discussion started by: sushwey

3. Red Hat

apache high cpu load on high traffic

Discussion started by: awww

4. UNIX for Advanced & Expert Users

What's a high load for my system?

Discussion started by: vanguard

5. Shell Programming and Scripting

script to generate core if cpu is high

Discussion started by: borderblaster

6. Red Hat

High cpu load average

Discussion started by: squid04

7. HP-UX

HIgh Load

Discussion started by: jyoti

8. UNIX for Advanced & Expert Users

Sun: High kernel usage & very high load averages

Discussion started by: lorrainenineill

9. AIX

Application high CPU load

Discussion started by: Frank2004