I have a 12 core Linux cpu but the load is really high on this box, hovering around 50.


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers I have a 12 core Linux cpu but the load is really high on this box, hovering around 50.
# 8  
Old 07-31-2014
OK, that answers one of rbatte1's questions... You're not swapping or low on memory. That's good.

If you won't show the rest of top, how about the ps listing? ps aux for lots of detail on memory and load etc.
# 9  
Old 07-31-2014
I hope this will help

Code:
top - 14:06:47 up 7 days	 11:46	  0 users	  load average: 46.05	43.68	32.2		
Tasks: 330 total	  20 running	 310 sleeping	   0 stopped	   0 zombie			
Cpu(s): 92.1%us	  6.6%sy	  0.0%ni	  0.2%id	  0.8%wa	  0.0%hi	  0.3%si	  0.0%st
Mem:  12164276k total	 12063308k used	   100968k free	    61984k buffers				
Swap:  2097144k total	      244k used	  2096900k free	 10139736k cached				
							
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND							
 2715 root      19   0  178m  79m 8068 R 121.2  0.7   0:27.66 zoombie							
 4155 root      17   0  166m  77m 8060 R 114.0  0.7   0:13.25 zoombie							
 4881 root      17   0  162m  74m 8064 R 109.4  0.6   0:08.75 zoombie							
26680 root      16   0  174m  52m 8116 R 90.3  0.4   0:59.79 zoombie							
26555 root      25   0  186m  98m 8264 R 83.7  0.8   0:29.13 zoombie							
12394 root      18   0  198m  50m 8224 R 75.8  0.4   2:08.08 zoombie							
 6877 root      15   0 2372m 440m  21m S 74.1  3.7   6558:06 zoombie							
 2174 root      25   0  364m  34m  13m R 70.2  0.3   0:12.50 zoombie							
31227 root      24   0  162m  56m 8144 R 64.9  0.5   0:24.82 zoombie							
16611 root      17   0  274m 147m 8244 R 64.6  1.2   0:48.05 zoombie							
26566 root      25   0  170m  60m 8268 R 61.9  0.5   0:30.78 zoombie							
30700 root      25   0  160m  53m 8084 R 60.3  0.5   0:21.50 zoombie							
 2508 root      25   0  194m  57m 8124 S 51.1  0.5   0:20.70 zoombie							
 2238 root      25   0  368m  28m 8240 R 49.7  0.2   0:11.46 zoombie							
 5368 root      18   0  350m  21m 7684 R 35.2  0.2   0:01.07 zoombie							
 5389 root      19   0 92712  24m 7960 R 14.5  0.2   0:00.44 zoombie							
30104 root      25   0  408m  27m 8256 R 10.2  0.2   0:25.06 zoombie							
 5401 root      19   0 72216  19m 7424 R  8.9  0.2   0:00.27 zoombie							
 4336 root      18   0 80352  22m 6700 S  8.2  0.2   0:01.32 zoombie							
 4115 root      18   0  106m  47m 6708 S  2.0  0.4   0:01.65 zoombie							
  553 root      10  -5     0    0    0 S  1.3  0.0 101:29.65 kswapd0							
 3919 dbpmt1    15   0 11016 1276  796 R  0.7  0.0   0:00.06 top							
 6893 root      16   0 57820 2876 1136 S  0.7  0.0  14:00.23 zoombie							
  552 root      15   0     0    0    0 S  0.3  0.0   9:21.33 pdflush							
 3643 root      10  -5     0    0    0 S  0.3  0.0  23:41.94 jbd2/dm-9-8							
 8353 root      15   0  103m 1420  688 S  0.3  0.0   1:51.97 hpasmlited							
    1 root      15   0 10372  708  596 S  0.0  0.0   0:12.26 init							
    2 root      RT  -5     0    0    0 S  0.0  0.0   1:16.91 anonymous/0							
    3 root      34  19     0    0    0 S  0.0  0.0   0:10.49 ksoftirqd/0							
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0							
    5 root      RT  -5     0    0    0 S  0.0  0.0   1:44.20 anonymous/1							
    6 root      34  19     0    0    0 S  0.0  0.0   0:06.19 ksoftirqd/1							
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1							
    8 root      RT  -5     0    0    0 S  0.0  0.0   1:26.14 anonymous/2							
    9 root      34  19     0    0    0 S  0.0  0.0   0:06.37 ksoftirqd/2							
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2							
   11 root      RT  -5     0    0    0 S  0.0  0.0   1:05.80 anonymous/3

# 10  
Old 07-31-2014
Well, you've got 20 instances of this "zoombie" thing running, whatever it is. Each instance appears to be threaded since some are using more than one CPU at once -- see the ones going over 100%. If they all decided to get busy at once, you certainly would get a very high load average.

As for why they'd do this, that depends on what the application's doing at any particular moment. It could be processing lots of queries, or it could be bogged down and accepting queries faster than it can process them...

One of them seems to have been running for nearly a year. Is this normal?
# 11  
Old 07-31-2014
Yes, you are correct it is processing lots of queries every moment. What do you think is the best possible solution to solve this problem. And how cam you say that each instance are using more than one cpu at a time....

Thanks for your help!!!
# 12  
Old 07-31-2014
A single-thread process can't use more than one core -- 100% -- so anything above 100% must have threads. All of them might have threads, just not as busy right now.

I must make this clear: We know nothing of this "zoombie" application. (Searching for it on the net just finds a game for android.) For all I know, this problem might be a bug or misconfiguration, fixed without resorting to anything I suggest next. You should really consult your vendor about this issue before you invest too much time or money.

That said, I see two paths.

1) Configure zoombie to run less threads. People will need to start waiting their turn. How to do this depends on the application in question.

If your disks and memory can't keep up with the load, it doesn't matter how many cores you have. 50 idle cores all waiting for the same disk (which might actually be counted in the load average on Linux) aren't any faster than 30 idle cores.

2) Toughen your system to tolerate such a high load.

A load average of 50 on a 12-core system isn't as horrible as it sounds... It's not great, but it's no worse than 4 on a single-core. Lots of systems (especially leased ones -- getting the most for you money) tolerate quite high loads nonstop.

They deal with it by having lots of memory, beefing up the disks, nice-ing the hungriest processes (so you get priority when you need to administrate), putting swap on a separate hard-disk, and other such things -- this is standard configuration for most large UNIX systems, but not so much for consumer Linux unless they really planned ahead. This reduces the punishment various bottlenecks will cause, so it slows down more or less proportionately instead of bogging down.

You still need to control how fast queries are allowed to happen, though. The total amount of work done per second doesn't improve without more cores, so the more queries you accept, the slower they'll be completed. If your server allows itself to accept more queries per second than it can complete in a second, it's doomed.

Questions for you:

I'd like to see top and ps and iotop sometime your load average is really, really high. If you can't catch it in the act, you might need something like sar to record it.

Last edited by Corona688; 07-31-2014 at 04:53 PM..
# 13  
Old 09-11-2014
Looks like you are running at 98% memory, but not swapping. There isn't much swap used. However you may not be getting very good file cache hits and it could be adding a bit to the load.

I hada experience with a search indexer that had 10+% improvement in performance when all the memory slots were filled v.s the same amount of memory larger dimms.

1st you need to identify your bottleneck. Memory or Disk is likely with indexers.

You get most of what you need our of sar, but you need to "tweek" sar to do disks. There is a nice site to graph it for you, but I can't post URLs.

Code:
sar -Ap

Change your sar to report every 5 minutes, instead of minutes. 10 minutes is not enough and 1 minute can be too much.
  • Disk I/O - to look at disk I/O, system time and I/O wait, to start to see if it a issue. You can "tweek" sar to show disks usage and it may help identify bottlenecks.
  • Network - look at for errors on interfaces and over bandwidth limits. Make sure you are running full deplex and the max speed you expect. Look out for things like time_waits in
    Code:
    netstat -antp

    . There are a number or kernel tuning parms.
  • Memory - look at utilization and swapping. In this time, there isn't much excuse to have critical application swapping. Be carefull how you read sar, free, ... they report file cache which is tempory memory use to improve performace and is a cache so it may not be used actively. There is some tuning that can be done, depending on the application behavior and memory. Like large pages, but you need to know the behavior and the what else you need to adjust.
  • CPU - look at user vs system time and I/O wait.
  • Load - wil be shown in sar, but needs to be idenitied. top has a 'H' option to toggle between showing threads and not show them. It could be the application isn't tuned. This is a overall indication of everything
It may just be the application needs tuning. Threads, size, Garbage Collection (GC), ... If java, watch the the GCs and look at tuning it.
# 14  
Old 09-11-2014
Thread is about a month and a half old.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Restart debian server if one specific process has more than 10 seconds have high cpu load

Hi, could someone give me an example for a debian server script? I need to check a process if the process has a high cpu load (top). If yes the whole server needs to reboot. Thats it, nothing more. ;) Hope someone could help me. Regards woisch (2 Replies)
Discussion started by: woisch
2 Replies

2. Shell Programming and Scripting

what would a script include to find CPU's %system time high and user time high?

Hi , I am trying to :wall: my head while scripting ..I am really new to this stuff , never did it before :( . how to find cpu's system high time and user time high in a script?? thanks , help would be appreciated ! :) (9 Replies)
Discussion started by: sushwey
9 Replies

3. Red Hat

apache high cpu load on high traffic

i have a Intel Quad Core Xeon X3440 (4 x 2.53GHz, 8MB Cache, Hyper Threaded) with 16gig and 1tb harddrive with a 1gb port and my apache is causing my cpu to go up to 100% on all four cores heres my http.config <IfModule prefork.c> StartServers 10 MinSpareServers 10 MaxSpareServers 15... (4 Replies)
Discussion started by: awww
4 Replies

4. UNIX for Advanced & Expert Users

What's a high load for my system?

I'm not sure if this belong in dummies or advanced so I made my best guess. Go easy on me if I get it wrong. I'm trying to determine what a high load for my system is. I run a php/mysql web server with a dedicated host. The host has a Intel Xeon 3110 (Dual Core) processor. Our load seems to... (5 Replies)
Discussion started by: vanguard
5 Replies

5. Shell Programming and Scripting

script to generate core if cpu is high

Hi guys, I need a script that will generate a core of a process when the process uses high cpu for a sustained period? So for example if a process is using greater than 80% cpu for more than 30 minutes do "gcore /var/tmp/process.core pid" (1 Reply)
Discussion started by: borderblaster
1 Replies

6. Red Hat

High cpu load average

Hi Buddies, Thanx for reading my first post... After googling a lot and searching so many forums I am feeling down a bit... Please don't mind my ignorence, and my grammer ... :) My server is running RHEL 2.6.9-5.EL. The cpu load is going higher than roof, almost 100 sometimes. I am... (2 Replies)
Discussion started by: squid04
2 Replies

7. HP-UX

HIgh Load

Hi All. In my production server the load is very high. normally it used to be less than 1,but now it is more than 5. I am new to unix all together. I want to know what is the reason behind high load. and if it is high what is the impact? (4 Replies)
Discussion started by: jyoti
4 Replies

8. UNIX for Advanced & Expert Users

Sun: High kernel usage & very high load averages

Hi, I am seeing very high kernel usage and very high load averages on my system (Although we are not loading much data to our database). Here is the output of top...does anyone know what i should be looking at? Thanks, Lorraine last pid: 13144; load averages: 22.32, 19.81, 16.78 ... (4 Replies)
Discussion started by: lorrainenineill
4 Replies

9. AIX

Application high CPU load

after a long period of running, the network application's CPU load in our syst em increase slowly, the failed at the end. we use "truss" tool to trace the process, found that it processes something like "semop" ,"semctl","thread_waitlock","kread" kernel call . The trace log file looks like the... (0 Replies)
Discussion started by: Frank2004
0 Replies
Login or Register to Ask a Question