Any limitations to the "top" command?

Question Any limitations to the "top" command?

Anyone know of any limitations? Also, does anyone know a great way to determine which processes are hogging CPU?

What OS and version and exactly what limitations to top? It should show what processes are hogging the CPU at any given time - else sar may give you a clue.

What is the system running (applications)? What are the complaints?
AIX 4.3 type->PowerPC_RS64-III
4 - 450Mhz processors

I guess an example might be when we are experiencing slow downs of our application, and we look at "top", it will tell us that we have 2GB of real memory free, yet there 3rd party reports that show we a paging A LOT. How can this be? We have 12GB total free memory.

Any ideas?
It would help to know the application. Also, unless you have been watching this server and have a clear history of what it has done and where it has gone, there isn't much you can speculate at.

What has changed? Added many more users? Database (assumtion) has grown 2+ times as large?

Been awhile since I've played with AIX but there are many tools included with it that you should be able to utilize to find what may be causing problems - but first you have to be sure you know that this "paging" is really a problem. Don't always believe it just because someone says so. It could be as simple as a user running reports that should be done during the night.

Check what's running on the system during these times.
Get someone to call when they see the problem - someone you believe to be truthful and who knows what they are talking about.
Become friends with your DBA (if applicable) - they can give insight to the application database.

Check on IBM's web site for more tools and resources.
Just to give you an idea of what were running:

Lawson Software application (ERP package similar to PeopleSoft)

Applications and database are on seperate servers, which are very similar.

800-900 users daily

Around 4:00pm EST EVERY DAY our applications batch job queue is jammed, and we can have 40-50 jobs "waiting" to run. According to "top", we have plenty of free memory available and CPU% never gets below 10 on any of the processors. Is this just too many users pounding away at the application at the same time? database issues? Everyone is trying to say it's memory, but I'm trying to convince them that it is the application.

Thanks for your input.
Again, this is something that one would have to watch to really give a good answer to. You need to monitor the server, know how many users are on (especially during the times that folks are complaining about), have the DBA look into what is being done (what causes those 40 to 50 jobs to backup...a report that started at 15:00?) You need to start sar if it isn't already running (or some other type of monitoring software that you can get a report from)

Following run on our (SUN) servers - where we have the diskspace we leave a month's worth of data - else as much as we can (usually a week).

in sys crontab:
0,10,20,30,40,50 * * * 0-6 /usr/lib/sa/sa1

script sa1 (came with SUN OS and HP-UX - probably on AIX also)

working with one of the files from yesterday (off a server running Peoplesoft with limited users {it's being decommissioned}) where backups was the only time we had heavy hits...output of sar -qucgf /var/adm/sa/sa09 - would actual report for every 10 minutes.
00:00:01 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
01:00:00 0.02 0.04 0.04 0.00 0.00
01:10:01 3.94 23.59 118.25 442.87 0.00
01:20:00 1.12 1.29 0.98 0.26 0.00
01:30:00 1.58 1.72 0.73 0.21 0.00
Average 0.68 1.70 2.58 4.06 0.00
Originally posted by lawadm1
Around 4:00pm EST EVERY DAY our applications batch job queue is jammed, and we can have 40-50 jobs "waiting" to run. According to "top", we have plenty of free memory available and CPU% never gets below 10 on any of the processors.
The very function of batch queue is to throttle the jobs. Let me describe the standard unix batch system....

The standard "batch" command that comes with unix is controlled by a file called queuedefs. If you don't turn the queuedefs file, then you get the defaults. And the defaults are that two jobs in the batch queue can be running at once. While two jobs are running, the other jobs wait. It doesn't matter how many cpu's or memory pages are idle. Two jobs is two jobs.

You can tune this if it's too low. But the idea is that batch jobs don't need to be run immediately. Putting a job in a batch queue and then complaining that the job isn't running immediately is a little odd. If you want to run a job immediately, simply run it.

Now I've been describing the standard unix batch system. But I'm guessing that your application's batch system is similiar. I would look carefully at the documentation for it to see if the number of jobs can be increased. Also remember that a batch system is reserving system resources so that they are available to non-batch jobs. You need to understand your mix of jobs. If everything is going through the batch system, you need to ask why.

And about paging... your os probably uses the paging system to get stuff into core. You want to ignore page-in's and look at page-out's.
