What happened if CPU utilization is near to 100 % in AIX 6.1?


 
Thread Tools Search this Thread
Operating Systems AIX What happened if CPU utilization is near to 100 % in AIX 6.1?
# 1  
Old 06-06-2013
What happened if CPU utilization is near to 100 % in AIX 6.1?

Hi all,
We have a setup where our application is running on 2 AIX servers ( AIX 6.1 , 16 CPU, P5 570 boxes). These boxes works as disaster recovery server for each other i.e. in case of 1 box failure, whole load will run out of other box.
Average CPU utilization on each box is between 30-40 % with a max CPU utilization of around 65 % on each box.
There has been a question raised whether one box processing capacity is enough to run the whole load and what would be the impact if CPU utilization reaches close to 100 % (assuming memory and all other parameters are not a problem)
What should be the best way to estimate the average & worst case impact on the application performance if whole load is run out of 1 box
What we have done is
- taken CPU utilization for each minute interval from both the servers
- Arithmetic add the cpu utilization to arrive a theoretical cpu utilization beyond 100 %
- estimate how long CPU utilization would be above 100 % in a day
- Then arrive at a performance hit based on the above 2 factors.
Would like to check with experts here, is this the right approach and what other factors should be taken into consideration.
Thanks

Moderator's Comments:
Mod Comment edit by bakunin: please stay away from formatting your text. I'd be eternally grateful for not having to clear out half a ton of superfluous SIZE-, FONT- and whatnot-tags again. Thank you.

Last edited by bakunin; 06-06-2013 at 11:03 AM..
# 2  
Old 06-06-2013
So, is this IBM HA configured as a two-node Active-Active cluster?

All that would likely happen from the OS point of view is that you will be CPU bound and processing will slow a little. The end-user symptoms you may see though may be worse. Does your application have a time-out in it? If the processing does not complete quite as quickly, will that be a problem? Is this a time critical application, e.g. financial transactions critical to the sequence or millisecond for real-time trading perhaps?


The worry will be if a time-out occurs although a background query is still processing. What you tend to get then is users re-submitting. Eventually you will be flooded by user requests and the CPU will never go idle for the rest of the day.


You need to answer these questions for yourself to see if you have adequate provision.


Robin
# 3  
Old 06-06-2013
Thanks Robin for your response.
Yes, 2 boxes are configured as GOVLAN cluster. Application time out is not an issue for this scenario. Is there a way to quantify how slow the processing would be ?
# 4  
Old 06-07-2013
There is more about running an application than simply counting CPU ticks. I don't know a "GOVLAN" cluster (to be honest: never heard about this cluster product), but the reason one uses a cluster is usually not load-sharing:

If you have a 2-way cluster you get additional availability. Even if one system (or some component of one system) breaks the whole still works. Shutting down one system will create the risk of the service (i.e. the application) not being available for some time. Assess this risk in terms of cost: how much will it cost to have the application not available for, say, 24 hours (normal response time for IBM)? Now, if this is too much, how much will the premium service cost to make IBM respond within less time? Calculate all these numbers carefully (or let someone calculate this) and you have something to weigh against the cost of the additional hardware.

CPU utilisation is least problematic performance-wise. Typically performance problems come in one of three types: a system is CPU bound, memory-bound or I/O-bound. Memory-bound systems start swapping and this is usually a killer. I/O-bound systems slow down dramatically too, because I/O has the lowest bandwidth (compared to memory and CPU) to begin with. CPU-bound systems only get a bit slower and this might not be a problem as long as the application is not time-critical.

A good idea is to set up some long-term monitoring to measure CPU (and some other resources) utilisation statistics. If you have such statistics for several days/weeks/months you can calculate all sorts of trends and better estimate the time when CPU saturation will happen.

Last thing: if your CPU utilization is low why don't you take away some CPUs from the LPARs in question and assign these to other systems? You could assign them back any time in case you face CPU saturation at some point in time.

I hope this helps.

bakunin
# 5  
Old 06-07-2013
cpu usage is a machine characteristic - compareable to rpm - and 100% utilization is like being at the end of the red zone. Other than it is high it tells us nothing about the performance of the vehicle - such as mpg (miles per gallon) might.

In other words, study how linear your performance is, in application terms, compared to machine terms, and you will have the best approximation of an answer to your question.

hope this helps.
# 6  
Old 06-20-2013
if peak utilization on 1 box had already gone up as high as 65%, your backup setup will only be fine while total utilization from both boxes loads on 1 server is less than 95%* ... once your total utilization reaches 100%, your users will definitely see performance hits ...

my quick rule of thumb here -- if i am not allowed to test 1 server to host both servers' daily loads and i do not have access to metrics -- would be to see what is the maximum total utilization of each server and add them together ... if below 80%, the "backup" server should be able to last long enough for the downed server to be fully recovered without the users seeing any performance issues ... if above 80%, there is a higher risk that users will see the performance hits long enough to complain that i do not know what i am doing while i am actually doing everything to recover the downed server ...

but just like everything we do, always take into account your computing environment and your users' job functions ... a performance hit on a development server is not as critical as a performance hit on an application server handling billions of dollars worth of financial transactions a day ... if systems are hyper-critical, always have a 3rd box handy and ready to go ...

*the actual threshold may be higher but i try to err on the side of caution ...
# 7  
Old 06-20-2013
Just Ice - good points.

But on a virtualized system, at least on POWER, user/sys/idle/wait are all relative to the column "pc" and/or "ec" when using shared processors.

On a system (this one idle) with a dedicated processor(s) the values you see for cpu consumption can be used in the "normal" way.
Code:
System configuration: lcpu=1 mem=9216MB

 kthr          memory                         page                       faults           cpu    
------- --------------------- ------------------------------------ ------------------ -----------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa
  6   0     773317    1494583     0     0     0     0      0     0     0    701   195  1  1 98  0
  7   0     773324    1494576     0     0     0     0      0     0     0   1004   186  0  2 98  0
  6   0     773324    1494576     0     0     0     0      0     0     0    629   197  0  2 98  0

However, when using shared processors, if your (summed) entitlement - which is what vmstat is showing (use mpstat or sar for a per logical processor breakdown - and total at the end) AND the summed entitlement is less than 100% you have at least the rest of your entitlement for additional processing.
While the number is less than entitlement AND user+sys is near 95% or higher, what this says is WHEN active, the processor is doing "user or sys" activities - "idle" time is being given back to the hypervisor for other activities.
Code:
$ lparstat 5 2

System configuration: type=Shared mode=Uncapped smt=On lcpu=2 mem=1024MB psize=1 ent=0.20 

%user  %sys  %wait  %idle physc %entc  lbusy   app  vcsw phint
----- ----- ------ ------ ----- ----- ------   --- ----- -----
  0.2   0.8    0.0   98.9  0.00   1.9    1.0  1.00   310     0 
  0.5   0.8    0.0   98.8  0.00   2.1    0.0  1.00   329     0 
$ vmstat -w 5 2

System configuration: lcpu=2 mem=1024MB ent=0.20

 kthr          memory                         page                       faults                 cpu          
------- --------------------- ------------------------------------ ------------------ -----------------------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa    pc    ec
  3   0     241787       3751     0     0     0     0      0     0    16   1123   245  5  3 92  0  0.02  10.2
  2   0     241787       3751     0     0     0     0      0     0     3     36   166  0  1 99  0  0.00   1.8

The same "user+sys" times when above entitlement could be a problem if the app number is getting very small (my system only has 1 cpu, so it is always small - Smilie )
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

CPU Utilization and Memory Utilization of Services and Applications

Hi, i am new to linux/RHEL 6.0 and i have two questions. 1) How to get the CPU utilization and Memory Utilization of all Services running currently? 2) How to get the CPU utilization and Memory Utilization of all Applications running currently? Please help me to find the script. ... (2 Replies)
Discussion started by: nossam
2 Replies

2. Shell Programming and Scripting

Cpu utilization by a process has to be mailed if more than 5% on AIX

i am using the below command in order to find the cpu utilization by a user..now i want to mail if the cpu utilization goes beyond 5%....can someone please help me ? ps auxw | sort -r +2 | awk '{ print $3,$1 }' | head -6 | egrep "USER|#anonymous#" %CPU USER 2.0 anonymous Regards,... (6 Replies)
Discussion started by: arorap
6 Replies

3. AIX

How to calculate AIX CPU utilization using lparstat command

Could you please explain about calculate CPU utilization of aix server using lparstat command? Here below i have provided example output from aix test server. System configuration: type=Shared mode=Uncapped smt=On lcpu=4 mem=4096 psize=63 ent=0.50 %user %sys %wait %idle physc %entc ... (1 Reply)
Discussion started by: maruthu
1 Replies

4. Shell Programming and Scripting

CPU Utilization

I know how to check the CPU utilization and memory space like: vmstat top free What i want ot check on my linux system is... how much CPU are available on my system and do i need to put extra CPU. Also need to check the hardware configuration and the space related to the same ... (3 Replies)
Discussion started by: j_panky
3 Replies

5. AIX

CPU Utilization

Hi All, Can some one help me in finding % CPU Utilization ? From VMSTAT command, How we can find % utilization ? Thanks (3 Replies)
Discussion started by: VBudatha
3 Replies

6. AIX

AIX 6.1 Power6 - Sys CPU utilization twice that of User

Hello, We just purchased two new 4-way (one active one failover) 5Ghz Power6 Servers (failover) with 64GB RAM (32GB per node) runing AIX 6.1 with two LPARs per node connected to our SAN with two 4GB HBAs. The PROD LPAR has 2 dedicated CPUs (4 virtual) and the TEST LPAR has 2 dedicated CPUs. ... (3 Replies)
Discussion started by: troym72
3 Replies

7. UNIX for Dummies Questions & Answers

how to get persistant cpu utilization values per process per cpu in linux (! top,ps)

hi, i want to know cpu utilizatiion per process per cpu..for single processor also if multicore in linux ..to use these values in shell script to kill processes exceeding cpu utilization.ps (pcpu) command does not give exact values..top does not give persistant values..psstat,vmstat..does njot... (3 Replies)
Discussion started by: pankajd
3 Replies

8. Shell Programming and Scripting

CPU Utilization

Hi to All, Would you please help me. I would like to know, In Unix How to know CPU utilization for every process. Thanks in Advance. Thanks, Divyang (3 Replies)
Discussion started by: div_Neev
3 Replies

9. Shell Programming and Scripting

CPU Utilization

Dear friends, I am doing a report daily for all most 30 more serves... i need to check out CPU utlization bu (top command, 100 - (ideal value)) and Memory utilization too could some one help me how can i get it directly, if scripts also no problem.. i will very thankful if some one... (3 Replies)
Discussion started by: bullz26
3 Replies

10. UNIX for Dummies Questions & Answers

% CPU utilization

Hi, iam new to AIX and had a basic question. I was asked to give a chart of CPU utilization in percentage for every hour. where can i look for information and which column should i look at? I would be very thankful if somebody could respond. Thanks! karthik (1 Reply)
Discussion started by: karthikosu
1 Replies
Login or Register to Ask a Question