AIX 6.1 Power6 - Sys CPU utilization twice that of User


 
Thread Tools Search this Thread
Operating Systems AIX AIX 6.1 Power6 - Sys CPU utilization twice that of User
# 1  
Old 01-25-2010
AIX 6.1 Power6 - Sys CPU utilization twice that of User

Hello,

We just purchased two new 4-way (one active one failover) 5Ghz Power6 Servers (failover) with 64GB RAM (32GB per node) runing AIX 6.1 with two LPARs per node connected to our SAN with two 4GB HBAs. The PROD LPAR has 2 dedicated CPUs (4 virtual) and the TEST LPAR has 2 dedicated CPUs.

When we started parallel testing to move our production application to this server, I noticed that it didn't seem to be performing as fast as I thought it should compared to our existing server.

Our exisiting server is an 8-way, 1.6Ghz Power5 with 32GB RAM (16GB per node) connected to our SAN with two 2GB HBAs. We have 5 physical CPUs dedicated to the PROD LPAR adn two dedicated to the TEST LPAR.

I started by running the common performance monitoring tools during our parallel testing, like VMSTAT, MPSTAT, etc. For some reason, the System/OS is using about twice the CPU as the User Processes. Everything I've ever seen or been told about UNIX Administration says that the System should not use more CPU than the User Processes. If it does, the OS needs to be better tuned for the application its running or there is some kind of bottleneck somewhere (CPU, I/O, Network).

So, the vendor (Not IBM) that installed the servers for us has not been able to explain or correct this after numerous changes to the filesystem, kernel settings, I/O buffers, etc.

VMSTAT does not show any obvious bottlenecks other than the OS seems to be using way too much CPU compared to the User Processes. r & b are less than the number of CPUs for the most part. wt is very low. pi/po are zero.

Here is a sample of the VMSTAT output during a test which represented about 20% of our production transaction volume going through the new server.

Code:
/>vmstat -w 5
 
System configuration: lcpu=8 mem=24576MB
 
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 0 3340754 1940960 0 0 0 0 0 0 65  21903  1207 2 3 95 0
2 0 3340770 1940918 0 0 0 0 0 0 260 43677  1654 3 6 90 1
2 0 3340885 1940771 0 0 0 0 0 0 125 37038  1601 3 8 89 0
1 0 3340742 1940897 0 0 0 0 0 0 75  24788  1290 2 5 93 0
1 0 3340699 1940913 0 0 0 0 0 0 99  38021  1375 2 6 92 0
1 0 3340685 1940898 0 0 0 0 0 0 97  34672   1424 2 5 93 0
1 0 3340673 1940881 0 0 0 0 0 0 137 23928  1640 3 8 89 0
1 0 3340634 1940881 0 0 0 0 0 0 135 39418  1615 3 6 91 0
1 0 3341393 1940054 0 0 0 0 0 0 166 26856  1749 4 7 88 0
1 0 3341378 1940035 0 0 0 0 0 0 106 35104  1301 2 5 93 0
1 0 3341381 1940008 0 0 0 0 0 0 73  36011  1171 2 3 95 0
1 0 3341407 1939948 0 0 0 0 0 0 101 23827  1330 2 5 93 0
1 0 3341377 1939933 0 0 0 0 0 0 143 33983  1638 3 7 90 0
0 0 3341394 1939876 0 0 0 0 0 0 249 38386  1634 3 6 90 0

As we put more load on the machine, I thought that this might even out, but it didn't. Below is a VMSTAT from a test that represented about 200% of our production volume being processed by the new server.

Code:
System configuration: lcpu=8 mem=24576MB
 
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 1 2323028 3038088 0 0 0 0 0 0 731 53814 7149 18 45 37 1
3 0 2324120 3036759 0 0 0 0 0 0 825 54887 7107 20 43 36 1
3 0 2324346 3036422 0 0 0 0 0 0 758 45717 5610 16 41 42 2
2 1 2324357 3036295 0 0 0 0 0 0 932 52869 7709 17 46 36 1
2 0 2324395 3036165 0 0 0 0 0 0 774 46603 5759 16 42 42 1
2 0 2323100 3037244 0 0 0 0 0 0 893 52706 7509 17 45 37 2
4 0 2324297 3035931 0 0 0 0 0 0 737 45806 5381 15 38 46 1
3 0 2324751 3035377 0 0 0 0 0 0 773 53345 7091 18 46 35 1
3 0 2324801 3035185 0 0 0 0 0 0 773 52399 7071 17 43 39 1
2 0 2325211 3034652 0 0 0 0 0 0 615 46806 5469 17 41 42 1
2 1 2325890 3033848 0 0 0 0 0 0 757 50556 6565 21 43 35 1
2 0 2324992 3034627 0 0 0 0 0 0 712 51243 7530 13 41 45 1
3 0 2325939 3033444 0 0 0 0 0 0 655 46586 5832 17 39 42 1
3 1 2325297 3033969 0 0 8 0 0 0 659 52255 6002 19 42 38 1
3 0 2325296 3033879 0 0 0 0 0 0 705 51447 6256 18 45 36 1
4 0 2326345 3032446 0 0 0 0 0 0 566 58858 9930 13 43 44 1
4 0 2326502 3032220 0 0 0 0 0 0 371 39132 3743 10 37 53 0
3 1 2329518 3029111 0 0 0 0 0 0 595 55473 6341 22 45 33 1

Is this normal? Am I just wrong about what normal CPU utilization should be in an AIX LPAR environment?

Thanks so much!
Troy
# 2  
Old 01-25-2010
What type of application are you using? If this is i.e. a sybase DB, I'd say cut the number of engines in half, since they are too idle and are spinning cpu because they're entirely bored... In addition cpus more than twice as fast doesnt mean that your apps are running twice as fast - It rather means you can run twice as much apps in the same time Smilie

Show us the output of vmstat -v too, please.

In addition why don't you put all your cpus into a pool and run your lpars uncapped. This would make much better use of the resources you have and gives the system the chance to unfold cpus it's not using what would give you a much clearer picture than this.

Kind regards
zxmaus
# 3  
Old 01-26-2010
The application is an interfacing application (Healthvision Cloverleaf) that receives Helathcare HL7 transactions via TCP/IP from various applications and routes them to the appropriate destination application(s). From the time the transaction is received until it is sent out of the interface engine, it could be translated (via Tcl programs) several times. Translation consists of transaction re-formatting, field reformatting, table maps, transaction filtering logic and other types of data massaging.

While being routed and translated, the transactions are stored temporarily in a Raima database (Healthvision's 3rd party Db agreement) for disaster recovery purposes. If someting dies or is stopped, the undelivered messages are read from the database and the engine continues where it left off. There are 15 points at which the transactions are saved to the database during their journey from the source to the destination.

So, the application is pretty I/O intensive. Each transaction is betweeen 1k and 2k and its written to the Raima Db at least 15 times. Our production environment processes about 1.2 million of these transactions per day on average. We are expecting our volume of transactions to roughly double in the next four years (hence the new server).

We changed our min and max to the values suggested by the vendor, Healthvision.

Thanks!!

Here is the output of the vmstat -v command:
Code:
/>uptime
  9:03am  up 12 days,  13:38,  3 users,  load average:  1.16, 1.07, 0.93
/>vmstat -v
              6291456 memory pages
              6080336 lruable pages
               225542 free pages
                    1 memory pools
              1007176 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 48.1 numperm percentage
              2930676 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 48.1 numclient percentage
                 90.0 maxclient percentage
              2930676 client pages
                    0 remote pageouts scheduled
                   13 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                28690 external pager filesystem I/Os blocked with no fsbuf



---------- Post updated at 09:09 AM ---------- Previous update was at 09:05 AM ----------

I'm not sure if our Admins will allow us to run our CPU in a pool, with LPARs uncapped.

I think the concern is that if there is resource intensive processing on the TEST node it might steal too much resources from the PROD node. We do, on occasion, perform large production transaction re-sends from our TEST node. We have to do this when a destination application didn't process the transactions correctly for an extended period of time or if the application was down for an extended period and we couldn't allow the transactions to queue in our engine that long.

However, I will discuss this option with admins and let you know their feedback.

Thanks again.
# 4  
Old 01-26-2010
That sy is higher than us just means the kernel has much more to do. That should be because the software is written like this, I would guess. To check in detail what is going on CPU-wise, have look at tprof maybe:

AIX 5.2 performance tools update, Part 3

I have no experience with tprof myself but maybe you get something of worth out of it analysing it.

You could also try again with enabling/disabling SMT (smtctl [on|off]) check for different behaviour, depending if the application has lot's of processes or is written mulithreaded (check with svmon -P| grep -p Pid). If SMT is working fine ie. dispatching works smooth, can be checked with "mpstat -s 1" (see System p education).

Checking how the work is distributed on the different (logical/virtual) CPUs can be done with sar -P ALL 1 9999 for example.

This one might be interessting for you too:
http://www.ibm.com/developerworks/wi...len+CPU+cycles
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

CPU Utilization and Memory Utilization of Services and Applications

Hi, i am new to linux/RHEL 6.0 and i have two questions. 1) How to get the CPU utilization and Memory Utilization of all Services running currently? 2) How to get the CPU utilization and Memory Utilization of all Applications running currently? Please help me to find the script. ... (2 Replies)
Discussion started by: nossam
2 Replies

2. AIX

What happened if CPU utilization is near to 100 % in AIX 6.1?

Hi all, We have a setup where our application is running on 2 AIX servers ( AIX 6.1 , 16 CPU, P5 570 boxes). These boxes works as disaster recovery server for each other i.e. in case of 1 box failure, whole load will run out of other box. Average CPU utilization on each box is between 30-40 %... (7 Replies)
Discussion started by: MKJ
7 Replies

3. Shell Programming and Scripting

Cpu utilization by a process has to be mailed if more than 5% on AIX

i am using the below command in order to find the cpu utilization by a user..now i want to mail if the cpu utilization goes beyond 5%....can someone please help me ? ps auxw | sort -r +2 | awk '{ print $3,$1 }' | head -6 | egrep "USER|#anonymous#" %CPU USER 2.0 anonymous Regards,... (6 Replies)
Discussion started by: arorap
6 Replies

4. Cybersecurity

Limit CPU and RAM utilization for new user in RedHat

We have a system with 4 Xeon Processors each with 10 cores, total 512 GB RAM and 10 TB Hard Drive. we want to create multiple user accounts with different resource limitations as : User 1: RAM : 50GB, PROCESSOR: 10 Cores , User folder in home directory of 10GB space. User 2: RAM :... (5 Replies)
Discussion started by: vaibhavvsk
5 Replies

5. AIX

How to calculate AIX CPU utilization using lparstat command

Could you please explain about calculate CPU utilization of aix server using lparstat command? Here below i have provided example output from aix test server. System configuration: type=Shared mode=Uncapped smt=On lcpu=4 mem=4096 psize=63 ent=0.50 %user %sys %wait %idle physc %entc ... (1 Reply)
Discussion started by: maruthu
1 Replies

6. Shell Programming and Scripting

Perl using modules CPU SYS and ENV

I have a project on which I have to use cpuinfo to get model name, number of cpus and bogomips.. for sys I need the uptime, total ram and from env I need the HOSTNAME.. from CMD ouputs i need the eth0-ip and .df.. The problem is that those modules have to be call from a subroutine and I have to... (0 Replies)
Discussion started by: thiedi16
0 Replies

7. UNIX for Advanced & Expert Users

Help! CPU consumption - %usr and %sys ??

On Linux, top shows how much % of CPU is consumed by "user" as well as "sys" like below. Tasks: 272 total, 3 running, 268 sleeping, 0 stopped, 1 zombie Cpu(s): 65.9%us, 33.8%sy, 0.0%ni, 0.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16300960k total, 16212488k used, 88472k free, ... (0 Replies)
Discussion started by: gomes1333
0 Replies

8. UNIX for Dummies Questions & Answers

how to get persistant cpu utilization values per process per cpu in linux (! top,ps)

hi, i want to know cpu utilizatiion per process per cpu..for single processor also if multicore in linux ..to use these values in shell script to kill processes exceeding cpu utilization.ps (pcpu) command does not give exact values..top does not give persistant values..psstat,vmstat..does njot... (3 Replies)
Discussion started by: pankajd
3 Replies

9. Shell Programming and Scripting

script for cpu utilization for each user

Can someone suggest me the script to calculate cpu utilization for each user in solaris say for a period of 24 Hrs or last 12 Hrs I am using solaris 10. Thanks in Advance (1 Reply)
Discussion started by: rajusa10
1 Replies

10. Shell Programming and Scripting

CPU Utilization

Dear friends, I am doing a report daily for all most 30 more serves... i need to check out CPU utlization bu (top command, 100 - (ideal value)) and Memory utilization too could some one help me how can i get it directly, if scripts also no problem.. i will very thankful if some one... (3 Replies)
Discussion started by: bullz26
3 Replies
Login or Register to Ask a Question