CPU performance


 
Thread Tools Search this Thread
Operating Systems AIX CPU performance
# 1  
Old 12-29-2016
CPU performance

In my oracle db server we have 15 cores (power8). The output of the vmstat is as below.

Code:
System configuration: lcpu=128 mem=208800MB ent=16.00

   kthr            memory                         page                       faults                 cpu             time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
  r   b   p        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa    pc    ec hr mi se
 31  26   0   21663391      51253 129782  5225     0     0 110978 318023 41193 302797 156113 43 20 33  5 14.76  92.3 11:05:36
 28  39   0   21674139      46016 129213 15721     0     0 134097 188404 42576 319091 172279 42 20 32  5 14.54  90.9 11:05:37
 34  36   0   21680968      46409 130385 13285     0     0 136618 141490 42035 385893 163647 45 20 30  5 14.93  93.3 11:05:38
 34  39   0   21669473      51955 115124 12338     0     0 107550 114801 38514 366075 154055 45 19 31  5 14.94  93.3 11:05:39
  0   0   0   21675046      50088 116082 14413     0     0 119399 359118 40334 429664 171751 43 21 30  6 14.64  91.5 11:05:40
 40  36   0   21660587      51752 137059  9433     0     0 123435 280612 42885 406191 176519 42 21 31  6 14.57  91.1 11:05:41
 40  28   0   21672996      47765 132584  1542     0     0 140214 276680 47654 409385 165033 42 21 31  5 14.79  92.4 11:05:42
 26  24   0   21692747      48527 124613  5004     0     0 144966 404145 45226 399544 163073 41 21 32  5 14.74  92.1 11:05:43
 30  29   0   21686313      45561 130212  3960     0     0 122430 127164 39446 371176 177801 43 21 31  5 14.69  91.8 11:05:44
 32  28   0   21668455      50598 137069  1746     0     0 121488 127432 46515 366503 174261 43 20 32  5 14.71  91.9 11:05:45
 26  33   0   21673035      50625 114717 10553     0     0 118945 380090 43050 303303 147158 42 19 34  5 14.61  91.3 11:05:46
 34  33   0   21695594      48900 115034  8768     0     0 135057 145302 41228 336146 149403 43 19 33  5 14.79  92.4 11:05:47
 25  33   0   21692935      50267 107122  6226     0     0 105233 190084 35381 361517 155287 46 18 31  4 15.07  94.2 11:05:48
 32  33   0   21686530      54135 100484  7210     0     0 98634 415431 35097 388896 162992 45 20 30  5 14.96  93.5 11:05:49
 31  32   0   21691954      47633 92779 13739     0     0 91240 422302 34362 343061 151114 45 20 31  5 14.88  93.0 11:05:50
 32  24   0   21700998      47232 94516 14072     0     0 102629 188748 36481 501056 132911 45 20 30  5 14.92  93.3 11:05:51

Now you can see the run queue and wait queue both are high also entitled capacity is also always 90%. The 15 to 25% cpu is always idle. So is there a cpu bottleneck on this system or its ok, no one complaining but want to know for myself.

Last edited by Don Cragun; 12-29-2016 at 03:46 AM.. Reason: Add CODE tags.
# 2  
Old 12-29-2016
You have no CPU shortage at all, but the server is heavily misconfigured: it is one small step away from swapping to death. The high numbers in "fr" and "sr" are signs that memory is on the brink of being exhausted and the system is already scanning frantically for places which can be swapped out in case. The chock full blocked-queue and the wait% in the CPU section come from the system having to wait for I/O. This would be OK if the presented snapshot is from a backup cycle (where only I/O counts and the system is normally bound by that) but if this is the usual state of affairs the system would greatly profit from more I/O-capacity (like a better network connection, faster disks, etc. - where exactly the bottleneck in I/O is doesn't show up in the picture).

I don't know for sure but if this is an Oracle system you most probably have the SGA configured too big. Reduce it in size (or add memory, with the same effect) and add I/O capacity and you probably can take away ~3-4 processors without the performance being altered at all, perhaps even better.

You might want to read the Performance Tuning Introduction i wrote for an in-depth explanation of what is going on in your system.

I hope this helps.

bakunin

Last edited by bakunin; 12-29-2016 at 11:53 AM..
These 2 Users Gave Thanks to bakunin For This Post:
# 3  
Old 12-30-2016
SGA is 40Gb only and still huge amount of memory is utilised. Also run and block thread total is grater than the number of cores so how you have suggested the cpu can be freed. The disk used are flash disk with 8gb fc ports. its a flash storage so should not face io issue.
The snap shot provided is a regular work and not from backup time.
# 4  
Old 12-30-2016
Quote:
Originally Posted by powerAIX
SGA is 40Gb only and still huge amount of memory is utilised.
I see that. I just can't see by which process, hence the assumption. If you want to know which process(es) is/are responsible use the ps-command.

Quote:
Originally Posted by powerAIX
Also run and block thread total is grater than the number of cores so how you have suggested the cpu can be freed.
The ony thing that matters is the number of logical CPUs, which is 128. I guess that if you look closely at these you will find that some of them are unused.

But since you mentioned that the SGA is only 40GB out of 200GB memory what else is running on this server? And wouldn't it be better to have, whatever it is, running in a separate LPAR so that the DB is not in danger if something goes wrong?

Quote:
Originally Posted by powerAIX
The disk used are flash disk with 8gb fc ports. its a flash storage so should not face io issue.
OK, but i said "disk or network". Where the I/O problem is is ńot showing up in the vmstat-output, just the fact that there is one somewhere. You will have to use iostat and netstat to find out where exactly this is.

Understand, that we cannot do your work for you and we see only what you show us. I don't know what your system is doing all day long and i canonly base my assumptions on what i see. If you want more output from us give us more input. Furthermore, all i have written yet you could have gotten from the link i gave you. So you might want to make use of the help you get.

I hope this helps.

bakunin
These 3 Users Gave Thanks to bakunin For This Post:
# 5  
Old 01-02-2017
in iostat avg queue coloumn max time is 0.5 that is also not continuous.
Also in netstat there are no crc errors please tell me what exactly i can check in that. When checked with
< ps aux | head -1; ps aux | sort -rnk 6 | more> all are oracle processees and nothing else is running on the system. that is why am asking for help.


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 01-02-2017 at 05:51 AM.. Reason: Added CODE tags.
# 6  
Old 01-02-2017
As i have already said: DESCRIBE YOUR SYSTEM!

Right now i have seen: 200G memory, 16 processors, running Oracle.

What i do not know (the list is not complete):

- which SAN / which multipath software?
- which OS level?
- which software? (that includes specifications, versions, etc.)
- what is the underlying managed system?
- how are your volume groups organised?
- network connections?
- is the system a cluster or not? (If yes, which one? HACMP? ORACLE RAC? else?)

Quote:
Originally Posted by powerAIX
in iostat avg queue coloumn max time is 0.5 that is also not continuous.
I am at a loss what you mean. Try iostat 1 and write that to a file for some time (say: one minute or so). You will have to filter out the lines with real devices because multipathing software (if you use such a thing) creates pseudo-hdisk-devices that show up in iostat but are meaningless. For instance, with EMC PowerPath software you have one hdiskXX for every path and a hdiskpowerXX-device which is the real LUN. You have to watch only this.

Also in netstat there are no crc errors please tell me what exactly i can check in that.

Well, CRC errors are your least concern, i'd say. More realistic scenarios are: misconfigured DNS servers so that name resolution takes long, the software running on externally authenticated user accounts (Kerberos, ...) which take a long time to authenticate and similar things.

Quote:
Originally Posted by powerAIX
When checked with
< ps aux | head -1; ps aux | sort -rnk 6 | more> all are oracle processees and nothing else is running on the system. that is why am asking for help.
I am trying to help you. I just can't if you don't give me some info. performance tuning is like tuning a car: i am trying to explain it to you, but so far i know it has so many hp power, but not even the model! And telling me something like "when i step on the accelerator pedal it makes wwrrrrrmmmm" doesn't help me either.

Your ps-commands won't show you anything because AIX doesn't work like BSD. Instead, do the following:

First, look for shared memory that may be allocated. Use ipcs -m and have a look there. If you want to have it analyzed by us: post it complete.

Next, have a look where all your memory is spent. Try ps -Alo pid,vsz,args (or variations, have a look at the man page of ps for details) to see all the processes with their allocated memory. Note that the memory unit used here is pages (1 page = 4096 bytes), not bytes.

At last have a look at how the system is tuned: issue vmo -a and look at the "minperm" and "maxperm" lines. Issue vmstat -vs and look at the lines with "fs i/O with no pbuf" (or something such, i have no system at hand to look it up) and if there are high numbers there.

So, this is for a start. There might be more such requests, because performance tuning can be complicated. In every case: i suggest you look the commands up in the man pages and try to understand what is done with them and what the reason is for them being used. Because in the end the person most qualified to tune your system is: you! You know the system best and you sit in front of it.

I hope this helps.

bakunin
# 7  
Old 01-03-2017
I really appriciate your help.

The system is Power 880. Its a vio client. The npiv is configured.
Network cards are FCoE Adapter on vios sharing one port to only 2 lpars.
The vios level is 2.4.2.20 and the DB oslevel is 7100-03-04.
The software is oracle 11G
The storage is assigned via IBM SVC cluster and IBM fs800 is the storage. I have only this much info anything else i need to ask storage admin.
The IBM sddpcm drivers are installed and each storage data volume has 4 active paths.
The OS volume has 2 different fcs paths.
There is no cluster for this server.

As this is a IBM storage and device drivers installed there are no pseudo disks.
The sample output of iostat as below.
==================================================================================================== ===================
Code:
Disks:                           xfers                                read                                write                                  queue                    time
-------------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------
                       %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max  avg   avg  serv
                       act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull
hdisk11                2.0   8.7M  43.0   8.7M   0.0  182.0   1.5    0.2    5.4     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.1   0.0   0.0  0.0  08:00:01
hdisk5                 0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk8                 0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk15                3.0   2.3M  13.0   2.3M   0.0   45.0   2.1    1.0    8.1     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk19                1.0   2.7M  16.0   2.7M   0.0   48.0   1.4    0.6    3.6     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk9                 2.0   2.9M  14.0   2.9M   0.0   72.0   2.0    0.7    4.9     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.1   0.0   0.0  0.0  08:00:01
hdisk26                0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk17                1.0   2.1M  13.0   2.1M   0.0   39.0   1.7    1.0    6.4     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk13                1.0   2.2M  13.0   2.2M   0.0   53.0   1.4    0.3    3.5     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk28                0.0   4.1K   1.0   4.1K   0.0    2.0   0.6    0.6    0.7     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01

==================================================================================================== ===========================================

The hosts entries are managed by /etc/hosts file only and no dns server is configured. The authentication is only for local accounts so no external authentication.

The ipcs output as below

Code:
T        ID     KEY        MODE       OWNER    GROUP
Shared Memory:
m   1048576 0x01002357 --rw-------     root   system
m   2097153 0x6100e09b --rw------- pconsole   system
m   1048578 0xffffffff D-rw------- pconsole   system
m         3   00000000 --rw-r-----   oracle      dba
m         4   00000000 --rw-r-----   oracle      dba
m         5 0x5ad596a4 --rw-r-----   oracle      dba
m         6   00000000 --rw-r-----   oracle      dba
m         7   00000000 --rw-r-----   oracle      dba
m         8 0x0bf59740 --rw-r-----   oracle      dba
m 156237834 0x0d033d30 --rw-rw----     root   system
m 206569483 0xffffffff D-rw-------   oracle      dba
m  71303180 0xffffffff D-rw-------   oracle      dba
m  23068685 0xffffffff D-rw-------   oracle      dba
m  17825806 0xffffffff D-rw-------   oracle      dba
m  17825807 0xffffffff D-rw-------   oracle      dba
m  17825808 0xffffffff D-rw-------   oracle      dba
m 351272981 0xffffffff D-rw-------    oemdb      dba

---------------------------------------------------------------

The output of ps top 20
Code:
38143450 281392 /oemdata/agent_software/core/1.10.0/oracle_common/jdk/bin/java -Xmx172M -server
32245454 237240 /oemdata/agent12c/core/12.1.0.4.0/jdk/bin/java -Xmx318M -server 
10617204 195412 /usr/sbin/rsct/bin/IBM.HostRMd
 8061550 125000 ora_arc2_fininddb
 9437556 122440 ora_arcc_fininddb
 9568356 105096 ora_arc7_fininddb
11600338 104520 ora_arc6_fininddb
10879398 99016 ora_arc9_fininddb
18546830 93000 ora_arce_fininddb
 4981288 92680 ora_arc0_fininddb
18022482 91464 ora_arc4_fininddb
 8847704 86728 ora_arcg_fininddb
11075588 86664 ora_arc8_fininddb
 3605070 86216 ora_arcb_fininddb
12845242 86088 ora_arca_fininddb
 7537316 85192 ora_arci_fininddb
 7209628 84552 ora_arc5_fininddb
 4129490 84232 ora_arcj_fininddb
 7864702 80840 ora_arch_fininddb

The minperm and maxperm is
Code:
minperm = 1526905
maxperm = 45807403
minperm% = 3
maxpin% = 90

Code:
131148847278 pending I/O waits
643098287800 start I/Os
185464 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2228 filesystem I/Os blocked with no fsbuf
73 client filesystem I/Os blocked with no fsbuf
12421 external pager filesystem I/Os blocked with no fsbuf

Also the tuning parameters are provided by IBM.

Last edited by rbatte1; 01-04-2017 at 06:35 AM.. Reason: Scrutinizer changed icode tags everywhere to code tags; rbatte1 joined lines in iostat output for clarity
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Understanding & Monitoring CPU performance (Load vs SAR)

Hi all, Been reading a lot of the cpu load and its "analogy of it to car traffic path of expressway" From wiki Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states... (13 Replies)
Discussion started by: javanoob
13 Replies

2. Shell Programming and Scripting

AIX CPU performance script ?

I want to write a shell script which will print AIX CPU utilization memory utilization every 5 mins redirect to file. How do i do it? Please advise. Which commands I should use? (3 Replies)
Discussion started by: vegasluxor
3 Replies

3. HP-UX

Performance - CPU spiking

We have a DB server which is constantly utilised above 95% above. This is becoming nuisance when the monitoring team frequently calls to check on it. Frankly I do not know what to tweak or even interpret the outputs. I noticed constant 30 to 60% in wio column of the cpu utilisation. There... (1 Reply)
Discussion started by: sundar63
1 Replies

4. SCO

CPU Performance Problems on VMWARE

hi We have migrated SCO 5.0.6 into ESX4, but the VM eats 100% of the virtual CPU. Here is top print from the SCO VM: last pid: 16773; load averages: 1.68, 1.25, 0.98 02:08:41 79 processes: 75 sleeping, 2 running, 1 zombie, 1 onproc CPU states: 0.0% idle, 17.0% user,... (7 Replies)
Discussion started by: ccc
7 Replies

5. HP-UX

Bad performance but Low CPU loading?

There might be some problem with my server, because every morning at 7, it's performance become bad with no DB extra deadlock. But I just couldn't figure it out. Please give me some advise, thanks a lot... According to the CPU performace chart, Daily CPU loading Maximum: 42 %, Average:36%. ... (8 Replies)
Discussion started by: GreenShery
8 Replies

6. Solaris

In Solaris Zones Dedicated-Cpu Performance?

Hi All, While creating zone we will mention min and max cpu cores, like add dedicated-cpu set ncpus=NUM_CPUS_MIN-NUM_CPUS_MAX end Ques1: Suppose thing that non global zone uses only minimum cores at particular time What the other cores will do, Will it shared to global zone? Ques:2... (1 Reply)
Discussion started by: vijaysachin
1 Replies

7. Solaris

Multi CPU Solaris system shows 100% CPU usage.

Hello Friends, On one of my Solaris 10 box, CPU usage shows 100% using "sar", "vmstat". However, it has 4 CPUs and prstat and glance are not showing enough processes to justify high CPU utilization. ========================================================================= $ prstat -a ... (4 Replies)
Discussion started by: mahive
4 Replies

8. News, Links, Events and Announcements

Announcing collectl - new performance linux performance monitor

About 4 years ago I wrote this tool inspired by Rob Urban's collect tool for DEC's Tru64 Unix. What makes this tool as different as collect was in its day is its ability to run at a low overhead and collect tons of stuff. I've expanded the general concept and even include data not available in... (0 Replies)
Discussion started by: MarkSeger
0 Replies

9. AIX

Performance Problem - High CPU utilization

Hello everybody. I have a problem with my AIX 5.3. Recently my unix shows a high cpu utilization with sar or topas. I need to find what I have to do to solve this problem, in fact, I don't know what is my problem. I had the same problem with another AIX 5.3 running the same... (2 Replies)
Discussion started by: wilder.mellotto
2 Replies
Login or Register to Ask a Question