AIX 6.1 Power6 - Sys CPU utilization twice that of User
Hello,
We just purchased two new 4-way (one active one failover) 5Ghz Power6 Servers (failover) with 64GB RAM (32GB per node) runing AIX 6.1 with two LPARs per node connected to our SAN with two 4GB HBAs. The PROD LPAR has 2 dedicated CPUs (4 virtual) and the TEST LPAR has 2 dedicated CPUs.
When we started parallel testing to move our production application to this server, I noticed that it didn't seem to be performing as fast as I thought it should compared to our existing server.
Our exisiting server is an 8-way, 1.6Ghz Power5 with 32GB RAM (16GB per node) connected to our SAN with two 2GB HBAs. We have 5 physical CPUs dedicated to the PROD LPAR adn two dedicated to the TEST LPAR.
I started by running the common performance monitoring tools during our parallel testing, like VMSTAT, MPSTAT, etc. For some reason, the System/OS is using about twice the CPU as the User Processes. Everything I've ever seen or been told about UNIX Administration says that the System should not use more CPU than the User Processes. If it does, the OS needs to be better tuned for the application its running or there is some kind of bottleneck somewhere (CPU, I/O, Network).
So, the vendor (Not IBM) that installed the servers for us has not been able to explain or correct this after numerous changes to the filesystem, kernel settings, I/O buffers, etc.
VMSTAT does not show any obvious bottlenecks other than the OS seems to be using way too much CPU compared to the User Processes. r & b are less than the number of CPUs for the most part. wt is very low. pi/po are zero.
Here is a sample of the VMSTAT output during a test which represented about 20% of our production transaction volume going through the new server.
As we put more load on the machine, I thought that this might even out, but it didn't. Below is a VMSTAT from a test that represented about 200% of our production volume being processed by the new server.
Is this normal? Am I just wrong about what normal CPU utilization should be in an AIX LPAR environment?
What type of application are you using? If this is i.e. a sybase DB, I'd say cut the number of engines in half, since they are too idle and are spinning cpu because they're entirely bored... In addition cpus more than twice as fast doesnt mean that your apps are running twice as fast - It rather means you can run twice as much apps in the same time
Show us the output of vmstat -v too, please.
In addition why don't you put all your cpus into a pool and run your lpars uncapped. This would make much better use of the resources you have and gives the system the chance to unfold cpus it's not using what would give you a much clearer picture than this.
The application is an interfacing application (Healthvision Cloverleaf) that receives Helathcare HL7 transactions via TCP/IP from various applications and routes them to the appropriate destination application(s). From the time the transaction is received until it is sent out of the interface engine, it could be translated (via Tcl programs) several times. Translation consists of transaction re-formatting, field reformatting, table maps, transaction filtering logic and other types of data massaging.
While being routed and translated, the transactions are stored temporarily in a Raima database (Healthvision's 3rd party Db agreement) for disaster recovery purposes. If someting dies or is stopped, the undelivered messages are read from the database and the engine continues where it left off. There are 15 points at which the transactions are saved to the database during their journey from the source to the destination.
So, the application is pretty I/O intensive. Each transaction is betweeen 1k and 2k and its written to the Raima Db at least 15 times. Our production environment processes about 1.2 million of these transactions per day on average. We are expecting our volume of transactions to roughly double in the next four years (hence the new server).
We changed our min and max to the values suggested by the vendor, Healthvision.
Thanks!!
Here is the output of the vmstat -v command:
---------- Post updated at 09:09 AM ---------- Previous update was at 09:05 AM ----------
I'm not sure if our Admins will allow us to run our CPU in a pool, with LPARs uncapped.
I think the concern is that if there is resource intensive processing on the TEST node it might steal too much resources from the PROD node. We do, on occasion, perform large production transaction re-sends from our TEST node. We have to do this when a destination application didn't process the transactions correctly for an extended period of time or if the application was down for an extended period and we couldn't allow the transactions to queue in our engine that long.
However, I will discuss this option with admins and let you know their feedback.
That sy is higher than us just means the kernel has much more to do. That should be because the software is written like this, I would guess. To check in detail what is going on CPU-wise, have look at tprof maybe:
I have no experience with tprof myself but maybe you get something of worth out of it analysing it.
You could also try again with enabling/disabling SMT (smtctl [on|off]) check for different behaviour, depending if the application has lot's of processes or is written mulithreaded (check with svmon -P| grep -p Pid). If SMT is working fine ie. dispatching works smooth, can be checked with "mpstat -s 1" (see System p education).
Checking how the work is distributed on the different (logical/virtual) CPUs can be done with sar -P ALL 1 9999 for example.
Hi,
i am new to linux/RHEL 6.0 and i have two questions.
1) How to get the CPU utilization and Memory Utilization of all Services running currently?
2) How to get the CPU utilization and Memory Utilization of all Applications running currently?
Please help me to find the script.
... (2 Replies)
Hi all,
We have a setup where our application is running on 2 AIX servers ( AIX 6.1 , 16 CPU, P5 570 boxes). These boxes works as disaster recovery server for each other i.e. in case of 1 box failure, whole load will run out of other box.
Average CPU utilization on each box is between 30-40 %... (7 Replies)
i am using the below command in order to find the cpu utilization by a user..now i want to mail if the cpu utilization goes beyond 5%....can someone please help me ?
ps auxw | sort -r +2 | awk '{ print $3,$1 }' | head -6 | egrep "USER|#anonymous#"
%CPU USER
2.0 anonymous
Regards,... (6 Replies)
We have a system with 4 Xeon Processors each with 10 cores, total 512 GB RAM and 10 TB Hard Drive.
we want to create multiple user accounts with different resource limitations as :
User 1: RAM : 50GB, PROCESSOR: 10 Cores , User folder in home directory of 10GB space.
User 2: RAM :... (5 Replies)
Could you please explain about calculate CPU utilization of aix server using lparstat command?
Here below i have provided example output from aix test server.
System configuration: type=Shared mode=Uncapped smt=On lcpu=4 mem=4096 psize=63 ent=0.50
%user %sys %wait %idle physc %entc ... (1 Reply)
I have a project on which I have to use cpuinfo to get model name, number of cpus and bogomips.. for sys I need the uptime, total ram and from env I need the HOSTNAME.. from CMD ouputs i need the eth0-ip and .df..
The problem is that those modules have to be call from a subroutine and I have to... (0 Replies)
On Linux, top shows how much % of CPU is consumed by "user" as well as "sys" like below.
Tasks: 272 total, 3 running, 268 sleeping, 0 stopped, 1 zombie
Cpu(s): 65.9%us, 33.8%sy, 0.0%ni, 0.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16300960k total, 16212488k used, 88472k free, ... (0 Replies)
hi,
i want to know cpu utilizatiion per process per cpu..for single processor also if multicore in linux ..to use these values in shell script to kill processes exceeding cpu utilization.ps (pcpu) command does not give exact values..top does not give persistant values..psstat,vmstat..does njot... (3 Replies)
Can someone suggest me the script to calculate cpu utilization for each user
in solaris say for a period of 24 Hrs or last 12 Hrs
I am using solaris 10.
Thanks in Advance (1 Reply)
Dear friends,
I am doing a report daily for all most 30 more serves... i need to check out CPU utlization bu (top command, 100 - (ideal value))
and Memory utilization too
could some one help me how can i get it directly, if scripts also no problem..
i will very thankful if some one... (3 Replies)