Analyzing CPU usage


 
Thread Tools Search this Thread
Operating Systems AIX Analyzing CPU usage
# 1  
Old 10-10-2011
Analyzing CPU usage

Hi Admins,

I need your help to analyze the cpu usage of our main server. I have shared below, CPU usages during busy hours and non busy hours.

CPU usage is always full at busy hours. Users always complaints about slowness. This server is a lpar partition and configured as uncapped mode.

Partitions configured as uncapped can obtain temporary access to additional processor resources without changing the entitled capacity of
this or any other partition in the system.

CPU is always busy on business hours. This means that lpar is failed to obtain the temporary access to additional processor resources since there are no additional processor resources available.

This means we have to add new cpu.

What we can be done from OS end…?

Please provide your expert thoughts.

CPU usage during non-busy hrs

Code:
System configuration: lcpu=8 ent=3.60 mode=Uncapped
03:00:17    %usr    %sys    %wio   %idle   physc   %entc
03:00:19      10       5       4      81    0.60    16.6
03:00:21       7       4       6      83    0.46    12.7
03:00:23       9       5       6      81    0.55    15.4
03:00:25       8       4       6      82    0.51    14.3
03:00:27       8       4       6      82    0.50    14.0
03:00:29       8       4       6      82    0.47    13.2
03:00:31       8       4       5      82    0.54    15.0
03:00:33       8       4       5      82    0.53    14.6
03:00:35       7       3       6      83    0.46    12.6
03:00:37       8       4       6      83    0.46    12.8
Average        8       4       6      82    0.51    14.1

System configuration: lcpu=8 ent=3.60 mode=Uncapped
03:00:40    %usr    %sys    %wio   %idle   physc   %entc
03:00:42       7       3       6      83    0.44    12.3
03:00:44       7       4       7      83    0.43    12.0
03:00:46       6       3       7      84    0.36    10.1
03:00:48       6       3       8      83    0.36    10.1
03:00:50       6       3       7      84    0.39    10.9
03:00:52       7       3       8      83    0.39    10.9
03:00:54       7       3       7      83    0.41    11.5
03:00:56       6       3       7      83    0.38    10.6
03:00:58       7       3       6      83    0.44    12.3
03:01:00       7       3       7      83    0.41    11.5
Average        7       3       7      83    0.40    11.2

CPU usage during business hours

Code:
System configuration: lcpu=8 ent=3.60 mode=Uncapped
11:09:22    %usr    %sys    %wio   %idle   physc   %entc
11:09:24      75      20       5       1    3.55    98.6
11:09:26      77      22       1       0    3.59    99.8
11:09:28      82      17       1       0    3.59    99.6
11:09:30      85      14       1       0    3.59    99.8
11:09:32      84      15       1       0    3.60    99.9
11:09:34      84      15       0       0    3.60   100.0
11:09:36      83      17       0       0    3.60   100.0
11:09:38      79      17       3       1    3.59    99.7
11:09:40      79      16       4       1    3.57    99.1
11:09:42      80      17       3       1    3.60    99.9
Average       81      17       2       0    3.59    99.7
 
System configuration: lcpu=8 ent=3.60 mode=Uncapped
11:09:45    %usr    %sys    %wio   %idle   physc   %entc
11:09:47      82      16       2       0    3.60   100.0
11:09:49      63      19      15       3    3.18    88.3
11:09:51      58      22      18       2    3.11    86.5
11:09:53      78      17       4       0    3.56    99.0
11:09:55      87      13       0       0    3.60   100.0
11:09:57      88      12       0       0    3.60   100.0
11:09:59      88      12       0       0    3.60   100.0
11:10:01      87      13       0       0    3.60    99.9
11:10:03      85      14       1       0    3.60    99.9
11:10:05      83      15       2       0    3.59    99.8
Average       80      15       4       1    3.50    97.3
 
System configuration: lcpu=8 ent=3.60 mode=Uncapped
11:10:08    %usr    %sys    %wio   %idle   physc   %entc
11:10:10      83      15       2       0    3.59    99.8
11:10:12      79      16       5       1    3.56    98.8
11:10:14      85      12       2       0    3.59    99.6
11:10:16      85      12       3       0    3.58    99.3
11:10:18      84      13       2       0    3.57    99.2
11:10:20      85      12       3       1    3.58    99.4
11:10:22      85      13       2       0    3.60    99.9
11:10:24      78      15       6       1    3.51    97.6
11:10:26      78      18       3       0    3.58    99.5
11:10:28      75      19       5       1    3.54    98.2
Average       82      15       3       0    3.57    99.1
System configuration: lcpu=8 ent=3.60 mode=Uncapped
11:10:31    %usr    %sys    %wio   %idle   physc   %entc
11:10:33      87      13       0       0    3.60    99.9
11:10:35      85      14       1       0    3.59    99.8
11:10:37      88      12       0       0    3.60    99.9
11:10:39      88      12       0       0    3.60    99.9
11:10:41      84      15       1       0    3.60   100.0
11:10:43      85      13       1       0    3.60    99.9
11:10:45      84      14       1       0    3.60    99.9
11:10:47      82      16       2       0    3.59    99.7
11:10:49      83      17       1       0    3.60   100.0
11:10:51      83      14       3       1    3.58    99.3
Average       85      14       1       0    3.59    99.8

11:41:31    %usr    %sys    %wio   %idle   physc   %entc
11:41:32      85      11       3       1    3.58    99.3
11:41:33      82      10       6       2    3.51    97.5
11:41:34      80      13       5       2    3.54    98.3
11:41:35      83      13       3       1    3.57    99.0
11:41:36      83      12       3       1    3.57    99.2
11:41:37      83      12       4       1    3.54    98.2
11:41:38      84      13       3       1    3.59    99.6
11:41:39      79      17       4       1    3.58    99.3
11:41:40      72      22       5       1    3.54    98.3
11:41:41      57      11      25       7    2.68    74.6
Average       79      13       6       2    3.47    96.4

What is %wio here - CPU waiting time for IO..?Smilie

From man page for sar - %wio
Quote:
Reports the percentage of time the processor(s) were idle during which the system had outstanding disk/NFS I/O request(s).
But i dont think so.Becaue when calculating CPU idle time in sar or vmstat, it adds the value of user,sys and wio to get the total CPU usage.
So %wio is CPU waiting time for IO.. Smilie

Regards
newaix

Moderator's Comments:
Mod Comment Please use code tags

Last edited by zaxxon; 10-10-2011 at 05:32 AM.. Reason: code tags, see PM
# 2  
Old 10-11-2011
Do you mind posting vmstat -Iwt 2 30 and iostat -Dl as a starter - and some more information about your system ?
What type of frame are you running on - how much resources are on the frame. What is your lpar doing when its busy. Is this an application- or a DB box. What type of storage are you using... which AIX version are you running - and so on.
3.6 ent cpus for 4 virtual cpus (if its p5 or p6) seems pointless to me. Even according to IBM you should give your box at least 5 virtual cpus - the way you run your box you have literally no benefit of virtualization whatsoever. And so high wait IO may point to a memory- or IO issue.
Regards
zxmaus
# 3  
Old 10-19-2011
Hi zxmaus,

thanks for the response. I have attached cpu details for the problem reported servers.Since now off-business hours cpu usage is below threshold. Below is the lparstat output.

Code:
 lparstat
System configuration: type=Shared mode=Uncapped smt=On lcpu=8 mem=20479 psize=4 ent=3.60
%user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
 12.9   5.0   10.2   71.9  0.70  19.5   24.5 8748351376 265273637

Below lparstat for 2nd server in lpar

Code:
   lparstat
System configuration: type=Shared mode=Capped smt=On lcpu=6 mem=18431 psize=4 ent=2.50
%user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
 20.0   2.6    4.1   73.3  0.61  24.6   25.7 8988174564 4328471475

Please let me know if any flaws in configuration.


Regards
newaix

Last edited by zaxxon; 10-19-2011 at 12:33 PM.. Reason: code tags
# 4  
Old 10-19-2011
Hi,

thank you for the data ...

What I see is that you have awful average response times on hdisk85-90 - your 50-60 ms for writes is nowhere near being acceptable -and that your IO is very uneven distributed across disks ... maybe a simple volumegroup reorganization with maximum instead of minimum spreading across disks will bring you some performance improvement.

What I see as well is that you have lots and lots and lots of blocked IOs due to insufficient filesystem buffers (and your system needs lots of filesystem buffers as you have really significant reads that want to be buffered. I would probably start setting some general buffers. Post vmstat -v and vmstat -s outputs if you like.


For your system load, - from the data you attached to your last post, you seem to have way too many cpus entitled. As cpus are usually quite expensive, I would cut that down to maybe 1 cpu, monitor and see how your system is doing. Unfortunately that data does not really match to the data from your earlier post.

Next - your system is doing a lot of scanning and freeing when busy to make sure that the freelist contains enough free memory pages for the next IO cycle - IO needs to be cached and the more IO you have the more memory you will need to proper buffer it - OR you change the behavior of the filesystems doing the IO. Mount options like rbrw, noatime and similar can change the memory utilization significantly - so does setting oracle to filesystem_io_options (I think) to setall instead of async. If this is AIX 5.3 than you might or might not need some adjustments in async IO settings as the standard values are way too low and need to be adjusted.

Regards
zxmaus
# 5  
Old 10-21-2011
Hi Please find the outputs.
code:
Code:
[ vmstat -v
              5242864 memory pages
              4968593 lruable pages
                 5170 free pages
                    2 memory pools
              1515042 pinned pages
                 80.0 maxpin percentage
                  5.0 minperm percentage
                 80.0 maxperm percentage
                 32.1 numperm percentage
              1595045 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 32.1 numclient percentage
                 80.0 maxclient percentage
              1595045 client pages
                    0 remote pageouts scheduled
              2259185 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
              1671181 external pager filesystem I/Os blocked with no fsbuf
                    0 Virtualized Partition Memory Page Faults
                 0.00 Time resolving virtualized partition memory page faults]
 
Code:
 [vmstat -s
         892151434903 total address trans. faults
         172862697532 page ins
          26961484724 page outs
                  615 paging space page ins
                 2829 paging space page outs
                    0 total reclaims
         141771661659 zero filled pages faults
               110845 executable filled pages faults
         738555440654 pages examined by clock
               340522 revolutions of the clock hand
         182805730940 pages freed by the clock
          11616035128 backtracks
                 3590 free frame waits
                    0 extend XPT waits
          19471756799 pending I/O waits
         199747416830 start I/Os
          25117452018 iodones
         668345405909 cpu context switches
          41732913544 device interrupts
          10612619533 software interrupts
          32691178953 decrementer interrupts
             27358468 mpc-sent interrupts
             27358456 mpc-receive interrupts
            266100573 phantom interrupts
                    0 traps
        1982305862214 syscalls ]

Now i want your view about cpu configuration.


Code:
Code :
[ lparstat
 System configuration: type=Shared mode=Uncapped smt=On lcpu=8    mem=20479 psize=4 ent=3.60
 %user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint
 ----- ----- ------ ------ ----- ----- ------ ----- -----
 12.9   5.0   10.2   71.9  0.70  19.5   24.5 9369523649 266169530 ]

Server has 8 logical cpus , 4 physical cpus and with entitled 3.60.Mode is uncapped.

As per your previous reply, this configuration is not good.Please tell me how can i reconfigure it


Regards
newaic

Last edited by zxmaus; 10-22-2011 at 08:39 AM..
# 6  
Old 10-21-2011
actually the server has 4 virtual cpus, is entitled to use 3.6 and being p5 that means 8 threads.
To answer your question thoroughly I would like to see some vmstat -Iwt 2 10 outputs from busy timeframes (vmstat is a whole lot better with those options than any other tool I know). From the length of your runqueue I would say double the virtuals which gives you more threads but its easier to say that when the data is taken from really busy systems - and address your IO issues by proper system tuning. You seem to have totally insufficient filesystem buffering. Being on AIX 5.3 I would suggest

Code:
vmo -p -o minperm%=3
vmo -p -o minfree=960
vmo -p -o maxfree=1088
vmo -p -o lru_file_repage=0
vmo -p -o lru_poll_interval=10
ioo -p -o j2_maxPageReadAhead=128
ioo -p -o maxpgahead=16
ioo -p -o j2_maxRandomWrite=32
ioo -p -o maxrandwrt=32
ioo -p -o j2_nBufferPerPagerDevice=1024
ioo -p -o pv_min_pbuf=1024
ioo -p -o numfsbufs=1024

if you still see growing numbers than you can go up to 2048 with numfsbufs - we usually do that
Additionally for a DB box you should set AIXTHREAD_SCOPE=S in /etc/environment

your numbers in the vmstat outputs are huge - how long is your box up ?

And do you mind posting the output of lsattr -El aio0 and iostat -A

And ... be warned - closing one bottleneck in many cases opens another one - it might turn out that your box needs more memory when the IO problems are fixed.

Regards
zxmaus
# 7  
Old 10-22-2011
System uptime is 636 days. To get a downtime also not easy.

i will provide outputs on monday.

Below are current values

Code:
code :
minperm = 248428
minperm% = 5
minfree = 960
maxfree = 1088
ru_file_repage = 0
lru_poll_interval = 10
lrubucket = 131072
j2_maxPageReadAhead = 128
j2_maxPageReadAhead = 128
j2_maxRandomWrite = 0
j2_maxUsableMaxTransfer = 512
maxpgahead = 16
maxrandwrt = 0
j2_dynamicBufferPreallocation = 32
j2_nBufferPerPagerDevice = 512
pv_min_pbuf = 512
numfsbufs = 294


>>actually the server has 4 virtual cpus, is entitled to use 3.6 and being p5 that means 8 threads >>

Can you please clarify this.

Regards
newaix

Last edited by zxmaus; 10-22-2011 at 08:40 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Overall CPU Usage

Hi Guys, I am a newbie on the forum. This is my first post, so first of all I would like to introduce myself. I am a SAS Analyst programmer working for an Health Insurance client. SAS is installed on a 16 CPU AIX Server with partitions running with shared processor. I have couple of... (2 Replies)
Discussion started by: saurabhiim2003
2 Replies

2. AIX

How to monitor the IBM AIX server for I/O usage,memory usage,CPU usage,network..?

How to monitor the IBM AIX server for I/O usage, memory usage, CPU usage, network usage, storage usage? (3 Replies)
Discussion started by: laknar
3 Replies

3. Solaris

Multi CPU Solaris system shows 100% CPU usage.

Hello Friends, On one of my Solaris 10 box, CPU usage shows 100% using "sar", "vmstat". However, it has 4 CPUs and prstat and glance are not showing enough processes to justify high CPU utilization. ========================================================================= $ prstat -a ... (4 Replies)
Discussion started by: mahive
4 Replies

4. Solaris

current CPU usage, memory usage, disk I/O oid(snmp)

Hi, I want to monitor the current cpu usage, monitor usage , disk I/o and network utlization for solaris using SNMP. I want the oids for above tasks. can you please tell me that Thank you (2 Replies)
Discussion started by: S_venkatesh
2 Replies

5. HP-UX

how can I find cpu usage memory usage swap usage and logical volume usage

how can I find cpu usage memory usage swap usage and I want to know CPU usage above X% and contiue Y times and memory usage above X % and contiue Y times my final destination is monitor process logical volume usage above X % and number of Logical voluage above can I not to... (3 Replies)
Discussion started by: alert0919
3 Replies

6. UNIX for Dummies Questions & Answers

CPU usage

can anyone tell me How to check memory and CPU usage of a certain process (1 Reply)
Discussion started by: ccp
1 Replies

7. Programming

CPU usage and memory usage

Please tell me solaris functions/api for getting following information 1- Function that tells how much memory used by current process 2- Function that tells how much memory used by all running processes 3- Function that tells how much CPU is used by current process 4- Function that tells how... (1 Reply)
Discussion started by: mansoorulhaq
1 Replies

8. UNIX for Dummies Questions & Answers

cpu usage

when i got the cpu usage values of the all process running in my sytem i see that 140% of the cpu is used. (using ps aux command) i have a 4 cpu system. can we say that averagely 35% of each cpu is used? and if i want to speak more precisely, how can i find out that, which cpu is used at... (4 Replies)
Discussion started by: gfhgfnhhn
4 Replies

9. Programming

Monitor CPU usage and Memory Usage

how can i monitor usages of CPU, Memory, Hard disk etc. under SUN Solaries through a c program or java program i want to store that data into database so i can show it graphically thanks in advance (2 Replies)
Discussion started by: Gajanad Bihani
2 Replies

10. Filesystems, Disks and Memory

cpu usage

hi, In response to your cpu usage answer I too read sys/sysinfo.h but , if we put these values to access the repective time fields in the array pst_cpu_time which is a member of the structure pst_dynamic values doesn't seem to match, why is like this? (0 Replies)
Discussion started by: sushaga
0 Replies
Login or Register to Ask a Question