I am trying to analyze the performance of an AIX system. I think I may have a disk I/O issue, but I am asking for help to validate or invalidate this assumption. I ran the commands below during a period of peak load.
Please help me to find any performance bottlenecks. Thanks in advance for your help! Let me know if any more information is required since I am new to performance tuning on AIX.
Have you checked the stats of lvmstat command .. we can find out which volumes or lv's are taking up more resources or as you said I/O requests in the peak time.
There is paging into the rootvg's filesystem while both run and blocked queue are small and a high wait at the same time. No paging space involved. This indicates a problem with the amount of computational memory compared to filecache memory rather than with I/O. Could you post the output of
Additionally in your iostat you can see that hdisk0 and hdisk1 are heavily busy. They are about ~90-100% busy all the time while you took the measurement.
Quote:
tty: tin tout avg-cpu: % user % sys % idle % iowait
1.0 4.0 1.5 6.0 1.5 91.0
It seems you have something on these disks which should be on separate disks. As Shockneck assumes too, that hdisk0 and hdisk1 belong to your rootvg, I'd say you check what kind of application data is placed in some LV there and move it to other disks to not hinder your rootvg.
Have a lspv to check what VGs are on hdisk0 and hdisk1, then have a lsvg -l on that VG to check what FS'es are there and try to move the stuff to disks that are not that busy or to new disks.
Then capture some seconds while this problem is occuring with filemon:
Stop this with "trcstop" after some 20-30 seconds.
You can check the file with
You should see which LV etc. is the most busy to be sure what is making trouble there.
Thank you for your replies. The system is only this busy during the nightly processing. Otherwise, it is mostly idle.
I do know that the volume group, logical volume, and filesystem configuration is far from ideal. The entire system is running from local disks. Some of the application filesystems are located in the rootvg. I am just trying to figure out the best course of action to fix these issues.
I will capture the output of the commands above during tomorrow's processing and post the results.
if it's really paging, try to create a striped or mirrored, depends on the vg design on hdisk2 hdisk3, paging space on hdisk 2 hdisk3, to balance the ps load on 4 disks instead of 1 or 2
mkps -a -n -s PPsyoulike vgonhdisk2/3 hdisk2
the size depends on the overall memory and the application you use
First off I'd like to state that your data is not nearly complete. As this is a welcome occasion to go through a performance tuning procedure lets cover this with a little depth:
1. Before you begin: the SLA
The first and most vital thing about performance tuning is to get to an SLA, a service level agreement, with the customer. It doesn't matter if "customer" is a real customer or just another department down the hall - there has to be some agreement between systems administration and user about how fast is fast enough. Otherwise it will be one of these endless (and pointless) tuning orgies which leaves everybody involved only unsatisfied - and yet exhausted - after a lot of work.
Agree with your customer about some measureable fact and declare the tuning to be successfully once this limit is reached. Something like "the system has to do X transactions per second" or "the response time of this program should not exceed 2 seconds" or "the depth of this queue should not exceeed X entries" or "the system should process X GB of data per hour", etc., etc..
Don't forget: make it written! Users tend to "forget" what they have agreed to, so get a written statement agreed upon by all parties involved. In case you wondered: "measureable" means "countable". It could be a wee bit faster sometimes is NOT "measureable", its daydreaming - your job is not to make dreams come true, yes?
2. Get your data
The next step is to get the data - ALL data. First you need the unchanging things: machine specifications, software releases, configurations, customization, etc., etc., etc.
The following is AIX-specific, but you should easily be able to "port" this to other OSes. Also notice that the lists are incomplete in nature: add to them whatever seems to add to the picture if necessity arises:
a) Hardware:
prtconf output
b) Software releases
instfix -i | grep AIX_ML (or TL)
version information of the application program(s) in question
c) customization
ioo -a output
vmo -a output
schedo -a output
no -a output
lsps -a output
crontabs
3. Get your data - again
After this, analyze the machine in light of what the customers tell you. In which regard is it "slow" - bad I/O? slow disks? unresponsive network connections? Write that down and save it for future reference.
Only now get the real performance data. A good start is (again, this is AIX-minded, but could easily be translated to other OS flavours):
vmstat
iostat
netstat/entstat
svmon
lsps
ps
4. The tuning process
Only now the real tuning starts. Note that this is a repetitive process and be prepared to go over step 3 & 4 again and again. Take the data gathered in step 3 and analyze them. Create a theory what is causing which symptom. (Btw.: everything can be a symptom. If the machine is responding notably faster for 10 minutes and then slows down again you want to know why this happens.) Look out for any repeting pattern in the data. If you find something try to find an explanation for it. That doesn't necessarily have to mean you could change it, but it will further you understanding of the systems workings.
Once you have a theory (explanation) of what happens why put this theory to test: apply - CAREFULLY! - selected changes to the system and watch what happens (basically go back to step 3, then compare).
Be sure to make only one change at a time. Otherwise you won't know which change has caused which difference in the data. You can tune only the same way you walk: one step after the other. If you try to make more the one step at the same time chances are you just jump on one foot up and down, effectively getting nowhere.
I hope this helps.
bakunin
PS: It has taken me some time to write this and in the meantime you have already gotten very good advice, so i have deleted what i have written about your actual problem. Still i think that talking about the tuning process in general is a good idea which is why i wrote this article. I really do hope it helps.
I have a IBM Power9 server coupled with a NVMe StorWize V7000 GEN3 storage, doing some benchmarks and noticing that single thread I/O (80% Read / 20% Write, common OLTP I/O profile) seems slow.
./xdisk -R0 -r80 -b 8k -M 1 -f /usr1/testing -t60 -OD -V
BS Proc AIO read% IO Flag IO/s ... (8 Replies)
Hi,
I'm supposed to capture many performance stats on AIX 6 and stuck up with below:
Priority queue
Disk cache hit%
Page out rate
Swap out rate
Memory queue
I see vmstatis helpful for "page out" but not sure how to get the "rate".
Could anyone please let me know how to get these... (4 Replies)
Hello,
I encounter some performance issues on my AIX 5.3 server running in a LPAR on a P520. How do I investigate performance issues in AIX. Is there any kind of procedure that takes me to the steps to investigate my server and find the sub systems that is causing the issues?
The performance... (1 Reply)
Hi,
I would like to hear your thoughts about this. We are running our Data warehouse on DB2 DPF (partition environment) and I have notice that sometimes we hit the Asynchronous-I/O-Processes peak. DB2 relies heavily on Asynchronous I/O so I would believe this has an negative impact.We are... (10 Replies)
Hello
I am new user of AIX; I have only basic knowledge of the UNIX commands, and I want to create script that will monitor the performance and resources usage on AIX 6.1 machine.
Basically I wan to start a loop that will grab, every 10 seconds, the CPU usage, the memory usage, the disk usage,... (1 Reply)
Hi Guys,
This is the situation I am in. Provide your views and input where should I start?
I have one P7 test server and a p520 production server. the job is taking pretty long on the P7 test server when compared to the P5 production server. below is the full detail.
Informix... (5 Replies)
Gurus, i have process that runs 5 times a day.
it runs normally (takes about 1 hour) to complete in 3 runs
but it is takes about ( 3 hrs to complete) two times
So i need to figure out why it takes significanlty high time during
those 2 runs.
The process is a shell script that connect to... (2 Replies)
I'm doing performance testing for one application which works on AIX.
But I don't know which performance parameters of memory need to be collected. Now, I just know very few:
1. page in
2. page out
3. fre
They are all collected by "vmstat" command.
I want to know, except for above... (2 Replies)
Hiya all,
I am a newbie sysadmin to AIX, i have worked on HPUX for 3 years.
I have started a new role with in an IBM house and because there is me and one other there are a couple of issues I cannot work out:
We havehad a production server slowing down processing batch jbs over the past... (6 Replies)