I am new registered user here in this UNIX forums.
I am a new system administrator for AIX 6.1. One of our servers performs poorly every time our application (FINACLE) runs many processes/instances. (see below for topas snapshot)
I use NMON or Topas to monitor the server utilization. I checked the the CPU Idle% and the idle percent is high, however the DISK Busy% is constantly high (during real poor performance, the DISK Busy% is most of the time 100%). Also, I noticed that the FILE/TTY Readch and Writech are constantly high too. See topas snapshot below:
System Model: IBM,8205-E6B
Machine Serial Number: 0678F8P
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 6
Processor Clock Speed: 3720 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 2 CARD_DB
Memory Size: 43776 MB
Good Memory Size: 43776 MB
Platform Firmware level: AL720_082
Firmware Version: IBM,AL720_082
Console Login: enable
Auto Restart: true
Full Core: false
Network Information
Host Name: CARDDB
IP Address: 10.10.10.100
Sub Netmask: 255.255.255.0
Gateway: 10.10.10.10
Name Server:
Domain Name:
Paging Space Information
Total Paging Space: 12288MB
Percent Used: 1%
Volume Groups Information
==============================================================================
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 458 109..48..83..109..109
hdisk1 active 546 390 29..60..83..109..109
==============================================================================
oravg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk8 active 4228 68 00..00..00..00..68
==============================================================================
Everytime this happens, we try to kill processes that is CPU consuming, but still, the DISK Busy% is high. If we reboot the server, the performance becomes okay, but we can't do this during production. Any suggestion on how to optimize this? is it our architecture (having only 1 hard disk for our data)? Does bottle-necking takes place here? What can we do to optimize our server? Any upgrades shall we make? for example increasing physical memory.
Thank you very much. I hope you can help since I am not a UNIX expert.
Last edited by jim mcnamara; 02-08-2012 at 09:49 AM..
Reason: code tags please
Killing processes to free resources is not a good idea. You might shoot something you still need.
Yes, from the look of it you have a severe bottleneck with your 1 hdisk. Is this hdisk a physical disk or a LUN from SAN storage?
Do you use asynchronous I/O (AIO) and have it tuned? Oracle will most probably benefit from it as well as getting additional disks.
nmon/topaz has a page that displays AIO stats, I think it was shift + a, not sure though, easy to try it out anyway.
You could post the output of
Code:
iostat -A 2 10
# and
vmstat -wt 2 10
# and
lsattr -El aio0
(the 1st 2 commands when there is traffic on your box) and use code tags when doing so, thanks.
and post the filesystem_io_options of oracle + oracle version + something about your disk layout - so are your filesystems setup with min or max distribution, blocksize ...
output of mount command will help and definitely mounting your oracle filesystems with noatime option and if you have a dedicated dump device with rbrw
If you dont want to use SETALL in filesystem_io_options than you might want to consider the filesystems containing oracle data + redologs to be mounted with cio, how many volumegroups with how many disks do you have and similar things
In many cases a hot disk is easily avoidable by changing your filesystems from minimum to maximum distribution and reorganize the volumegroup
I would be in addition interested in vmstat -v and vmstat -s outputs on top of what zaxxon asked for already.
Please gather all data during the time where the system is busy and slow - not during an idle timeframe or the data wont help
Thanks Zaxxon & zxmaus,
I don't know where to begin before this thread opened.
For iostat -A 2 10, vmstat -wt 2 10, vmstat -v and vmstat -s, I will post a snapshot for these once the issue occurs again.
For lsattr -El aio0, i did't get anything so i tried lsattr -El sys0 (i hope it will do).
--> See attachment - lsattr sys0.jpg
For "Do you use asynchronous I/O (AIO) and have it tuned?"
--> I have no idea for this since I am new here and I came here in the middle of the application roll-out to production. I wish I had a clue. No knowledge on the history of the servers here.
However i checked the I/O stat in nmon and here it is:
-->
Code:
Total AIO processes= 72 Actually in use= 0 CPU used= 1.1%
All time peak= 90 Recent peak= 7 Peak= 3.4%
If physical disk or LUN from SAN
-->Im not entirely sure if it's LUN from SAN but here's what i gathered:
from prtcfg/lsdev:
Code:
hdisk8 Available 05-00-00 SAS RAID 10 Disk Array
from lspv hdisk8
PHYSICAL VOLUME: hdisk8 VOLUME GROUP: oravg
PV IDENTIFIER: 00f678f86bb5b458 VG IDENTIFIER 00f678f800004c00000001326bb5b750
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 256 megabyte(s) LOGICAL VOLUMES: 8
TOTAL PPs: 4228 (1082368 megabytes) VG DESCRIPTORS: 2
FREE PPs: 68 (17408 megabytes) HOT SPARE: no
USED PPs: 4160 (1064960 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 00..00..00..00..68
USED DISTRIBUTION: 846..846..845..845..778
MIRROR POOL: None
for filesystemio_options:
-->I have no idea where to locate this? is this executed or set in a configuration file?
for "...min or max distribution, blocksize ...
output of mount command will help and definitely mounting your oracle filesystems with noatime option and if you have a dedicated dump device with rbrw..."
--> I am totally alost with the min/max tuning. no idea for this yet.
Again. Thanks very much for the help. It's greatly appreciated
Last edited by zxmaus; 02-10-2012 at 08:26 AM..
Reason: added tags
if the attachment is not viewable here's the lsattr -El sys0:
Code:
SW_dist_intr false Enable SW distribution of interrupts True
autorestart true Automatically REBOOT OS after a crash True
boottype disk N/A False
capacity_inc 0.01 Processor capacity increment False
capped true Partition is capped False
conslogin enable System Console Login False
cpuguard enable CPU Guard True
dedicated false Partition is dedicated False
enhanced_RBAC true Enhanced RBAC Mode True
ent_capacity 6.00 Entitled processor capacity False
frequency 6400000000 System Bus Frequency False
fullcore false Enable full CORE dump True
fwversion IBM,AL720_082 Firmware version and revision levels False
ghostdev 0 Recreate devices in ODM on system change True
id_to_partition 0X80000B9662900002 Partition ID False
id_to_system 0X80000B9662900000 System ID False
iostat false Continuously maintain DISK I/O history True
keylock normal State of system keylock at boot time False
log_pg_dealloc true Log predictive memory page deallocation events True
max_capacity 12.00 Maximum potential processor capacity False
max_logname 9 Maximum login name length at boot time True
maxbuf 20 Maximum number of pages in block I/O BUFFER CACHE True
maxmbuf 0 Maximum Kbytes of real memory allowed for MBUFS True
maxpout 8193 HIGH water mark for pending write I/Os per file True
maxuproc 2048 Maximum number of PROCESSES allowed per user True
min_capacity 3.00 Minimum potential processor capacity False
minpout 4096 LOW water mark for pending write I/Os per file True
modelname IBM,8205-E6B Machine name False
ncargs 256 ARG/ENV list size in 4K byte blocks True
nfs4_acl_compat secure NFS4 ACL Compatibility Mode True
pre430core false Use pre-430 style CORE dump True
pre520tune disable Pre-520 tuning compatibility mode True
realmem 44826624 Amount of usable physical memory in Kbytes False
rtasversion 1 Open Firmware RTAS version False
sed_config select Stack Execution Disable (SED) Mode True
systemid IBM,020678F8P Hardware system identifier False
variable_weight 0 Variable processor capacity weight False
Last edited by zxmaus; 02-10-2012 at 08:27 AM..
Reason: Code tags
From the data you provided so far, you have 1 raidset raid 10 from SAS (so internal storage) disks of a total of 1 TB (presented to the system as 1 disk) for 6 DBs and anything else running on the system excluding root - this just asks for problems as you access all your storage just with one serial path.
Even worse all your filesystems are sharing the same logfile and if I assume correctly and your filesystems are not mounted with noatime option that means that every single read (which includes as simple things as ls) and every single write of 8 different filesystems concur about access to the logfile which by nature makes this logfile naturally the hotspot of the entire system.
Still waiting for the vmstat outputs but I bet that your system has only the default filesystem tuning and is running out of buffers most of the time.
Can you post lvmo -a -v oravg output please to confirm?
Regarding aio - dont worry - on AIX 6.1 you find it with the ioo -a | grep aio command but AIX will turn it on automatically if oracle or any other application wants to use it.
filesystem_io_options is a variable set within oracle (ask your DBA) and can be set to none (standard I think in your oracle version), async or setall - the setall option lets decide oracle to use cio with async IO but wont let you access open database files outside of the database itself other than with rman which might be a problem if you dont do rman backups.
Please run a simple mount on the box to allow us to see if you are using any mount options on the filesystems.
So far
- consider to give each of your oravg filesystems its very own logfile
- consider another storage solution and a different filesystem layout if possible since 6 DBs in the same filesystem - even if this filesystem has its own logfile, are still not such a great idea. If that is not possible, than your disk will naturally stay busy since you only have one.
Just a quick note for macOS users.
I just installed (and removed) Parallels Desktop 15 Edition on my MacPro (2013) with 64GB memory and 12-cores, which is running the latest version of macOS Catalina as of this post. The reason for this install was to test some RIGOL test gear software which... (6 Replies)
Hi Everyone,
I have been struggling for few days with iSCSI and thought I could get some help on the forum...
fresh install of AIX7.1 TL4 on Power 710, The rootvg relies on 3 SAS disks in RAID 0, 32GB Memory
The lpar Profile is using all of the managed system's resources.
I have connected... (11 Replies)
Hi
We have an M3000 single physical processor and 8gb of memory running Solaris 10. This system runs two Oracle Databases one on Oracle 9i and One on Oracle 10g.
As soon as the Oracle 10g database starts we see an immediate drop in system performance, for example opening an ssh session can... (6 Replies)
Hello guys,
I have two servers performing the same disk operations. I believe one server is having a disk's impending failure however I have no hard evidence to prove it. This is a pair of Netra 210's with 2 drives in a hardware raid mirror (LSI raid controller). While performing intensive... (4 Replies)
Hello,
we have a machine with Solaris Express 11, 2 LSI 9211 8i SAS 2 controllers (multipath to disks), multiport backplane, 16 Seagate Cheetah 15K RPM disks.
Each disk has a sequential performance of 220/230 MB/s and in fact if I do a
dd if=/dev/zero of=/dev/rdsk/<diskID_1> bs=1024k... (1 Reply)
Hello all
We just built a storage cluster for our new xenserver farm. Using 3ware 9650SE raid controllers with 8 x 1TB WD sata disks in a raid 5, 256KB stripe size.
While making first performance test on the local storage server using dd (which simulates the read/write access to the disk... (1 Reply)
Hello,
I'm running a script on AIX to process lines in a file. I need to enclose the second column in quotation marks and write each line to a new file. I've come up with the following:
#!/bin/ksh
filename=$1
exec >> $filename.new
cat $filename | while read LINE
do
echo $LINE | awk... (2 Replies)
Hello,
i have a a1000 connected to an e6500. There's a raid 10 (12 disks) on the a1000.
If i do a
dd if=/dev/zero of=/mnt/1 bs=1024k count=1000
and then look at iostat it tells me there's a kw/s of 25000.
But if i do a
dd of=/dev/zero if=/mnt/1 bs=1024k count=1000
then i see only a... (1 Reply)
Hi you all, I have a BIG performance problem on an Sun E3500, the scenario is described below:
I have several users (30) accessing via samba to the E3500 using an application built on Visual Foxpro from their Windows PC , the problem is that the first guy that logs in demands 30% of the E3500... (2 Replies)