HP-UX and 'top'


 
Thread Tools Search this Thread
Operating Systems HP-UX HP-UX and 'top'
# 1  
Old 07-21-2005
Data HP-UX and 'top'

I've been working with an HP-UX system (RP5400 Series PA-RISC server) for about a year that hosts some middleware. The middleware sits between an Oracle DB (on another box) and the client applications running on about 800 PCs. From the beginning, I've noticed that 'top' reports between 0.0% and 10% Idle during the day. Seeing that this is the first big production Unix server I've ever worked with, I believe that these Idle times are WAY too low. However, the application vendor told us that 0% idle should be OK as long as it doesn't stay there. Another support person at the application vendor, however, told me that we should be seeing between 30-60% Idle during the day! My gut feeling is that if it drops below 20% consistently, there is probably a resource issue. Here is a typical 'top' screen from our system in the middle of the day:

Code:
System: unix_srv1                                       Thu Jul 21 12:26:53 2005
Load averages: 5.46, 3.90, 3.19
1285 processes: 1259 sleeping, 25 running, 1 zombie
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK  SWAIT   INTR   SSYS
 0    5.38  64.0%   5.9%  30.2%   0.0%   0.0%   0.0%   0.0%   0.0%
 1    6.83  79.4%   0.0%  20.6%   0.0%   0.0%   0.0%   0.0%   0.0%
 2    4.65  82.5%   0.0%  17.5%   0.0%   0.0%   0.0%   0.0%   0.0%
 3    4.98  69.7%   3.8%  26.5%   0.0%   0.0%   0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  -----  -----  -----  -----
avg   5.46  74.0%   2.4%  23.6%   0.0%   0.0%   0.0%   0.0%   0.0%

I've been monitoring with 'sar' as well and keeping a month's worth of 'sar' output in 15 minute intervals. A lot of what I see with 'sar' seems to reflect what I see with 'top', so I believe we are being pegged for CPU time by the user load. WIth all of this said, my base question is... what IS a reasonable Idle value/range on a Unix box? I've never actually seen this stated anywhere and I imagine it probably varies, but there should be some cushion, shouldn't there?

Last edited by deckard; 07-21-2005 at 06:07 PM..
# 2  
Old 07-21-2005
We have an oracle database server. It's an rp5400 with 4 cpu's. Idle time is in the 90's. These are powerful cpu's. Zero idle on that system would probably indicate run-away processes. Or incredible amounts of computation. Like finding the next mersenne prime or something.
# 3  
Old 07-21-2005
I would say there doesn't have to be a cushion. We have boxes that run all day at 100% CPU and if the boxes can meet the performance requirements set forth by the buisness, then it just means we are not wasting any funds on hardware that is just sitting idle.

Here is a good senario, lets say I add one cpu to the machine that is running at 100% all day thus effectively doubling the cpu resources. Now I see 50% idle vs 100% idle CPU. This may be bad, because this may mean that I just wasted money on a CPU. Now other resources such as memory or I/O are now to slow for that much CPU.

It can also go the other way, you might still see 100% CPU, but that means that you never had enough CPU to utilize the memory and I/O anyhow. See there is a balance.

You do have alot of processes running and even sleeping processes use resources so I would not be too worried about this unless there are performance issues, than I would make sure that when adding more CPU I have suffecient memory and I/O to utilize the CPU.

Obviously there is alot more to it than this, such as what the processes running are doing etc. and might not apply to you, but it is something to think about.
# 4  
Old 07-21-2005
How did you know about our work on the mersenne prime? ;p

Actually, the Oracle DB that our middleware utilizes is on an RX5670 (Itanium) box and that box is between 80-100% idle. So I think that Oracle is, expectedly, well behaved and the Itanium performs well. I've also been looking at the %CPU figure for processes and nothing seems to be grabbing a lot of CPU. What we do have is a lot of processes (over half [800+] are for the client PC connections). But they still only utilize a small portion of CPU (0.02% typically) so I'm not seeing anything that adds up to the high user CPU percentage figures I'm seeing. I've used 'ps -eo user,args,pcpu' there don't appear to be any major CPU hogs. The largest processes take up about 14-20% CPU but only briefly throughout the day.

Now that I look at things a little deeper though... Where I think the issue lies is in the architecture of the middleware and how it uses RAM. Looking at the RAM used by all process images for the middleware user account, it is constantly over the amount of physical RAM in the box. The vendor was particularly freaked out when they realized how large our client data set is which affects the size of lookup tables that they run for each process. It looks like we have (at this moment):

857 processes using 6M each ~10 Gigs
159 processes at 1.1M each ~180 Megs
43 processes at about 150M each ~6.5 Gigs
14 processes at about 111M each ~1.2 Gigs
Total = ~17 to 18 Gigs

Our box is maxed at 16 Gigs of physical RAM right now. So it would appear that we must be swapping a bit. When I tried a total of the vsz output of ps for the application user account, I saw 36 Gigs worth of process RAM usage. So... maybe the CPU user/sys/idle percentages are more a reflection of RAM usage? Does this sound like a possibility? I've already checked for i/o bottlenecks on our disk array and have cleared that as a possible problem point.


I believe we've populated the box's memory slots fully, so I don't think there is much we can do there until we get a bigger box.
# 5  
Old 07-21-2005
I should add that 'swapinfo' gives me these figures:

[
Code:
root@unix_srv1 root]# swapinfo
             Kb      Kb      Kb   PCT  START/      Kb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev     4194304 4124400   69904   98%       0       -    1  /dev/vg00/lvol2
dev     8384512  174928 8209584    2%       0       -    2  /dev/vg03/lvol1
dev     8384512  180948 8203564    2%       0       -    2  /dev/vg03/lvol2
reserve       - 12590784 -12590784
memory  13059784 2850200 10209584   22%

So it looks like our main swap space is almost completely utilized. I think this may be a definite indicator of where our issues lie...

Last edited by deckard; 07-21-2005 at 06:05 PM..
# 6  
Old 07-21-2005
Do:
swapinfo -t
vmstat 1 6
and post the results.
# 7  
Old 07-21-2005
Here's the output of those commands:
Code:
[root@unix_srv1 root]# swapinfo -t 
             Kb      Kb      Kb   PCT  START/      Kb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev     4194304 4149496   44808   99%       0       -    1  /dev/vg00/lvol2
dev     8384512 1000272 7384240   12%       0       -    2  /dev/vg03/lvol1
dev     8384512 1029924 7354588   12%       0       -    2  /dev/vg03/lvol2
reserve       - 12118200 -12118200
memory  13059784 2880056 10179728   22%
total   34023112 21177948 12845164   62%       -       0    -


[root@unix_srv1 root]# vmstat 1 6                                                                                                                      
         procs           memory                   page                              faults       cpu
    r     b     w      avm    free   re   at    pi   po    fr   de    sr     in     sy    cs  us sy id
    7     0     0  2307190  249357  584  484    11   15     0    0    52   8106  55961  6824  52 16 31
    7     0     0  2307190  247209  383  233    50    0     0    0   231   4782  53993  5288  80 19  1
    7     0     0  2307190  249481  558  441    40    0     0    0   184   4855  58674  5312  71 29  0
    7     0     0  2307190  246198  701  651    35    0     0    0   147   5197  63927  5689  76 24  0
    7     0     0  2307190  246198  877  686    29    0     0    0   117   5172  63334  5480  80 20  0
    7     0     0  2307190  247212  834  587    23    0     0    0    93   5453  63170  5747  66 26  8


Hmmm... the CPU idle is going all over the place right now. I ran 'vmstat' a few times and idle has ranged from 0-100. The swap usage is steadily increasing as system performance is dropping.

Last edited by Perderabo; 07-21-2005 at 05:55 PM.. Reason: Add code tags for readability
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Top command

Hi All, i am using the below command and once get the output and i need to keep the first batch only.in this case how to do this one. please help me on thistop -b -n 5 >top.txt Thanks, (3 Replies)
Discussion started by: bmk
3 Replies

2. Shell Programming and Scripting

Help with top command

Currently when i run top command i get the following columns . CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND In this how to remove '%WCPU' column ? Thanks very much in advance . (6 Replies)
Discussion started by: kpravinraj
6 Replies

3. AIX

Need a list of top 10 CPU using processes (also top 10 memory hogs, separately)

Okay, I am trying to come up with a multi-platform script to report top ten CPU and memory hog processes, which will be run by our enterprise monitoring application as an auto-action item when the CPU and Memory utilization gets reported as higher than a certain threshold I use top on other... (5 Replies)
Discussion started by: thenomad
5 Replies

4. AIX

Top command in AIX 4.2 (no topas, no nmon, no top)?

Is there a 'top' command equivalent in AIX 4.2 ? I already checked and I do not see the following ones anywhere: top nmon topas (1 Reply)
Discussion started by: Browser_ice
1 Replies

5. UNIX for Dummies Questions & Answers

Help using top and ps

help! i need help with locating where a program is being run from. when i type top -i it only lists the name and minimal info, not the programs location from where it is being ran. i ask because i just used the same named executable, a.exe for all the processes and have lost the schedule detailing... (4 Replies)
Discussion started by: shabs1985
4 Replies

6. HP-UX

Using TOP

Hey guys, the top format in HP-UX has the size which is the total virtual size and the res which is the resident size. What are these size and res? (1 Reply)
Discussion started by: sbn
1 Replies

7. Solaris

Top

How to display and update information about the top processes on the system(like "top" in Unix ) (1 Reply)
Discussion started by: iwbasts
1 Replies

8. UNIX for Dummies Questions & Answers

The value in TOP command

https://www.unix.com/showpost.php?p=98416&postcount=8 Referring to the post above... what is the unit that is measured in the TOP command under LOAD? (1 Reply)
Discussion started by: nickk
1 Replies

9. SCO

HP-UX top command

Is there a command in SCO Unix that does the same as the top command in HPUX. The command displays the jobs using the most system resources. Thanks You (0 Replies)
Discussion started by: joestrosser
0 Replies

10. UNIX for Dummies Questions & Answers

How Can I Have Top Display The Top 20 Processes??

how can i do that in a script withough havin the script halt at the section where the top command is located. am writign a script that will send me the out put of unx commands if the load average of a machine goes beyond the recommended number. top -n 20 i want to save this output to a file... (1 Reply)
Discussion started by: TRUEST
1 Replies
Login or Register to Ask a Question