I've been working with an HP-UX system (RP5400 Series PA-RISC server) for about a year that hosts some middleware. The middleware sits between an Oracle DB (on another box) and the client applications running on about 800 PCs. From the beginning, I've noticed that 'top' reports between 0.0% and 10% Idle during the day. Seeing that this is the first big production Unix server I've ever worked with, I believe that these Idle times are WAY too low. However, the application vendor told us that 0% idle should be OK as long as it doesn't stay there. Another support person at the application vendor, however, told me that we should be seeing between 30-60% Idle during the day! My gut feeling is that if it drops below 20% consistently, there is probably a resource issue. Here is a typical 'top' screen from our system in the middle of the day:
I've been monitoring with 'sar' as well and keeping a month's worth of 'sar' output in 15 minute intervals. A lot of what I see with 'sar' seems to reflect what I see with 'top', so I believe we are being pegged for CPU time by the user load. WIth all of this said, my base question is... what IS a reasonable Idle value/range on a Unix box? I've never actually seen this stated anywhere and I imagine it probably varies, but there should be some cushion, shouldn't there?
We have an oracle database server. It's an rp5400 with 4 cpu's. Idle time is in the 90's. These are powerful cpu's. Zero idle on that system would probably indicate run-away processes. Or incredible amounts of computation. Like finding the next mersenne prime or something.
I would say there doesn't have to be a cushion. We have boxes that run all day at 100% CPU and if the boxes can meet the performance requirements set forth by the buisness, then it just means we are not wasting any funds on hardware that is just sitting idle.
Here is a good senario, lets say I add one cpu to the machine that is running at 100% all day thus effectively doubling the cpu resources. Now I see 50% idle vs 100% idle CPU. This may be bad, because this may mean that I just wasted money on a CPU. Now other resources such as memory or I/O are now to slow for that much CPU.
It can also go the other way, you might still see 100% CPU, but that means that you never had enough CPU to utilize the memory and I/O anyhow. See there is a balance.
You do have alot of processes running and even sleeping processes use resources so I would not be too worried about this unless there are performance issues, than I would make sure that when adding more CPU I have suffecient memory and I/O to utilize the CPU.
Obviously there is alot more to it than this, such as what the processes running are doing etc. and might not apply to you, but it is something to think about.
How did you know about our work on the mersenne prime? ;p
Actually, the Oracle DB that our middleware utilizes is on an RX5670 (Itanium) box and that box is between 80-100% idle. So I think that Oracle is, expectedly, well behaved and the Itanium performs well. I've also been looking at the %CPU figure for processes and nothing seems to be grabbing a lot of CPU. What we do have is a lot of processes (over half [800+] are for the client PC connections). But they still only utilize a small portion of CPU (0.02% typically) so I'm not seeing anything that adds up to the high user CPU percentage figures I'm seeing. I've used 'ps -eo user,args,pcpu' there don't appear to be any major CPU hogs. The largest processes take up about 14-20% CPU but only briefly throughout the day.
Now that I look at things a little deeper though... Where I think the issue lies is in the architecture of the middleware and how it uses RAM. Looking at the RAM used by all process images for the middleware user account, it is constantly over the amount of physical RAM in the box. The vendor was particularly freaked out when they realized how large our client data set is which affects the size of lookup tables that they run for each process. It looks like we have (at this moment):
857 processes using 6M each ~10 Gigs
159 processes at 1.1M each ~180 Megs
43 processes at about 150M each ~6.5 Gigs
14 processes at about 111M each ~1.2 Gigs
Total = ~17 to 18 Gigs
Our box is maxed at 16 Gigs of physical RAM right now. So it would appear that we must be swapping a bit. When I tried a total of the vsz output of ps for the application user account, I saw 36 Gigs worth of process RAM usage. So... maybe the CPU user/sys/idle percentages are more a reflection of RAM usage? Does this sound like a possibility? I've already checked for i/o bottlenecks on our disk array and have cleared that as a possible problem point.
I believe we've populated the box's memory slots fully, so I don't think there is much we can do there until we get a bigger box.
Hmmm... the CPU idle is going all over the place right now. I ran 'vmstat' a few times and idle has ranged from 0-100. The swap usage is steadily increasing as system performance is dropping.
Last edited by Perderabo; 07-21-2005 at 05:55 PM..
Reason: Add code tags for readability
Hi All,
i am using the below command and once get the output and i need to keep the
first batch only.in this case how to do this one. please help me on thistop -b -n 5 >top.txt
Thanks, (3 Replies)
Currently when i run top command i get the following columns .
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
In this how to remove '%WCPU' column ?
Thanks very much in advance . (6 Replies)
Okay, I am trying to come up with a multi-platform script to report top ten CPU and memory hog processes, which will be run by our enterprise monitoring application as an auto-action item when the CPU and Memory utilization gets reported as higher than a certain threshold
I use top on other... (5 Replies)
help! i need help with locating where a program is being run from. when i type top -i it only lists the name and minimal info, not the programs location from where it is being ran. i ask because i just used the same named executable, a.exe for all the processes and have lost the schedule detailing... (4 Replies)
Hey guys, the top format in HP-UX has the size which is the total virtual size and the res which is the resident size. What are these size and res? (1 Reply)
https://www.unix.com/showpost.php?p=98416&postcount=8
Referring to the post above... what is the unit that is measured in the TOP command under LOAD? (1 Reply)
Is there a command in SCO Unix that does the same as the top command in HPUX. The command displays the jobs using the most system resources.
Thanks You (0 Replies)
how can i do that in a script withough havin the script halt at the section where the top command is located. am writign a script that will send me the out put of unx commands if the load average of a machine goes beyond the recommended number.
top -n 20
i want to save this output to a file... (1 Reply)