Help in monitoring performance problem in Linux


 
Thread Tools Search this Thread
Operating Systems Linux Help in monitoring performance problem in Linux
# 8  
Old 02-11-2010
Hi,

ok, i was finally able to enable the disk statistics using sar. ( i have do add -d on /usr/lib64/sa/sa1 script.

no now i've created new PDF with ksar with all the statisics i can.
the problem is that i dont understand and know what to look for in order to understand the problem.

as you can see i'm not on a cpu problem, i have a vary high load average and high numberof runq-sz.

so, i've write down all the statistics from a specific minutes i'ev chosen. its not the specific time our machine hanged but perhaps we can see the problem in a reguler time.

Code:
CPU monitor: (0.0 - 45.0) 
   cpu used: 40% (i/o wait)
context: (0-24,000)
   cswch/s: 18,000
i/o: 
  transerf/s (0- 5000) : 3000
  block read/write (0-80,000) : 50,000
  read/write/s (0 - 5000) : 3000
memory:
  memused: 8G (100%)
  memfree: (0-770M): 380m 
memory misc:
   buffers: (0-760m) : 290m
   cached: (0-7.5G): 6.5G
swap:
   swapfree: 8G
   swapused: 40kb
 
load:
  runq-sz (0-4): 2
  plist-sz(0-350): 320
  load average (0-10) : 1mm-7.5, 5mm-6, 15mm-3
 
page:
  frmpg/s((-)400 - 400) : 200
  bugpg/s: ((-)100 - 100): -25
  campg/s ((-)400 - 400): -150 
paging: 
   pgpgin/pgpgout  /s (0-22,500) : in- 2500, out-10,000
   fault/majflt / s (0-50,000) : 2500
 
processes:
  proc/s (0-57.00): 1

disks:
  sda:
  tps/s (0-500): 280
  read/write /s (0-35,000): write- 6000, read- 27,500
  avgrq-sz (0-200): 100
  avgqu-sz (0-75): 25
  await (0-200): 100
  svctm (0-3): 2.5
  util% (0-100): 90%

sdb:
  tps/s (0-300): 200
  read/write /s (0-32,500): write- 15000, read- 3000
  avgrq-sz (0-400): 150
  avgqu-sz (0-7.5): 1.5
  await (0-150): 1
  svctm (0-10): 2
  util% (0-50): 20

md/0:
  tps/s (0-5,000): 2000
  read/write /s (0-40,000): write- 15000, read- 5,000
  avgrq-sz (0-150): 10
  avgqu-sz (0-75): 20
  await (0-300): 1
  svctm (0-20): 1
  util% (0-75): 75

tony, are you see any problem?
first i though i just need to add memory. cause its looks like he uses all the memory, byt it doesnt make any sense cause there is no paging, and i guess linux allocate all the memory but not realy using it.
perhaps its the disk problem?

any help will be appriciated!
Thanks a lot!

Last edited by levic; 02-11-2010 at 08:11 AM..
# 9  
Old 02-11-2010
I am puzzled to 100% of memory in use but very little swap?

cpu used: 40% (i/o wait) is not good, try running iostat, for eample:
Code:
iostat -x -n -p ALL

and see what devices are seeing the most I/O, look at queue sizes and await.
# 10  
Old 02-12-2010
hi

this is the output for iostat -n -x ALL:
Code:
Linux 2.6.23.1-49.fc8 (bastille)        02/12/2010

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
ram0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram13             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram14             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
ram15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda              15.86   205.62   70.86   28.00  4845.48  1870.64    67.93     2.50   25.28   2.08  20.61
sdb               0.88   157.19   23.47   13.94  1826.80  1370.70    85.45     0.43   11.54   1.54   5.76
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00   43.23  263.15  3160.62  2105.20    17.19     3.25   10.58   0.39  11.91

Filesystem:              rBlk_nor/s   wBlk_nor/s   rBlk_dir/s   wBlk_dir/s   rBlk_svr/s   wBlk_svr/s
concorde:/export/home/lab         0.03         0.12         0.00         0.00         0.08         0.13
concorde:/export/home/build         0.03         0.12         0.00         0.00         0.08         0.13

sda and sdb are included LVM partition.
i have a RAID5 in my machine with 5 disks.
perhaps my disks are slow? they are sata 500G 7200rpms.
how it can be he used all the memory and no swap? and he always uses it, not only when he machine is overloaded.

thanks,
i have no idea how to figure it outSmilie

Last edited by Scott; 02-12-2010 at 05:09 AM.. Reason: Code tags, please...
# 11  
Old 02-13-2010
The interesting lines are:
Code:
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              15.86   205.62   70.86   28.00  4845.48  1870.64    67.93     2.50   25.28   2.08  20.61
sdb               0.88   157.19   23.47   13.94  1826.80  1370.70    85.45     0.43   11.54   1.54   5.76
dm-0              0.00     0.00   43.23  263.15  3160.62  2105.20    17.19     3.25   10.58   0.39  11.91

Filesystem:              rBlk_nor/s   wBlk_nor/s   rBlk_dir/s   wBlk_dir/s   rBlk_svr/s   wBlk_svr/s
concorde:/export/home/lab      0.03         0.12         0.00         0.00         0.08         0.13
concorde:/export/home/build    0.03         0.12         0.00         0.00         0.08         0.13

The busiest device is dm-0 with 263.15 writes per second but the device with the longest average read queue size is sdb, sda has a higher utilisation and a quite high run queue size.

Run:
Code:
# mount | grep sdb
# mount | grep sda

to find out what /dev/sdb and /dev/sda is mounted as.

The NFS mounts are evidently very quiet, but running iostat after your machine has done a lot of real work may show something different.

RAID 5 is good for securing your data but for best performance you need striping, the best compromise is a stripe mirrored against a another stripe.

Last edited by TonyFullerMalv; 02-13-2010 at 10:26 AM..
# 12  
Old 02-13-2010
Hi,

first of all i would likw to thank you a lot for all your help! i'm very appriciate it, so thanks a lot!

sda is my root filesystem. sdb is a disk which has LVM partition so i cant see it on mount command.
i have a rootvg with 3 filesystem which resides on both sdb and additional partition of sda.
i guess md-0 is the LVM as a total. (not sure).

anyway, so we can see the problem is in our disks, this why i have a high i/o wait and load.
in order to change the RAID, or even add more drives to the RAID 5 i need to break the raid, and currenlty its the wrost option, cause its a production machine.
what else can i do in oredr to increase performance?
addidng more cpu is not an option, cause i will still get the i.o wait. the processor will process the requerst much faster and still wait(and much more) to the disks.
i have a 8G RAM. perhaps i need to add more RAM for caching? i'm not sure how do i do that.

i'm not exactly sure i can say loud and clear, the disks are the problem

thanks.
# 13  
Old 02-13-2010
To be fair RAID 5 is going to be better than a single disc or jut a mirrored pair of discs.
Performance tuning is not an exact art, the figures can look terrible but if the machine is performing the task you want it to do okay then they are not a problem, if the machine is performing the required task then you start looking at the figures to see where the bottle neck is, once you deal with it another bottle neck may also need dealing with.
Suggestions:
Copy the contents of your current disks onto another set of discs in a stripe then reconstruct your RAID 5 set of discs into a stripe and mirror them against the first stripe?
Copy your current RAID 5 configuration onto 10K RPM discs?
Ensure you are using the fastest interface possible for your platform, i.e. SCSI or SATA tend to be faster than IDE?

Last edited by TonyFullerMalv; 02-13-2010 at 05:30 PM..
# 14  
Old 02-14-2010
Quote:
Originally Posted by TonyFullerMalv
To be fair RAID 5 is going to be better than a single disc or jut a mirrored pair of discs.
Possibly.

In this case, unless I am mistaken there is a two disk RAID 5 which would me that:

1. It is always running as degraded.
2. Every write has a at least a double overhead.

Also there is a massive IO imbalance between the two disks, possibly due to different workloads hitting the same disk and causing a lot of head shuttle.

No matter how you look at this it appears that you don't have enough disks for what you are doing.

SATA drives are pretty good for sequential I/O so if the workloads are cleanly separated there is a chance that they could be fine. However with a random workload faster disks are definitely preferable.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

HELP - Resource/Performance Monitoring Script - Red Hat Enterprise Linux Server

Hi all, ------------------------- Linux OS Version/Release: ------------------------- Red Hat Enterprise Linux Server release 5.5 (Tikanga) Linux <hostname> 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux I have a server that hosts 30+ Oracle... (1 Reply)
Discussion started by: newbie_01
1 Replies

2. Solaris

Performance / Batch monitoring

What tools can I use to look "deeper" into a process to see if the job is actually running or just hanging. What is the best method to accomplish this? SunOS 5.10 Generic_142900-14 sun4v sparc SUNW,T5240 (2 Replies)
Discussion started by: Harleyrci
2 Replies

3. AIX

Performance Monitoring of FileSystem

As I am new to the Unix field, I would like to get the clarification regarding the Filesystem. The scenario is.. The filesystem (/drbackup) is getting monitored and if it exceeds the threshold, we will receive an alert from it. The issue is that we receive an alert with the description of... (2 Replies)
Discussion started by: A.Srenivasan
2 Replies

4. Shell Programming and Scripting

Performance monitoring help needed.

How would i check for following? 1)open ports in my linux machine. 2)Hard disk read speed. 3)Hard disk write speed. (2 Replies)
Discussion started by: pinga123
2 Replies

5. Solaris

Performance Monitoring

Hi all, I am planning to give a presentation on performance measure. I have decided to focus on the commands which are used to know the performance of the server. I have a idea of prstat,vmstat,netstat, and iostat. Could anybody suggest me any other commands which are used for perforamance... (7 Replies)
Discussion started by: priky
7 Replies

6. Linux

Linux/Unix performance monitoring

This is my first post (yes I'm a newbie).... :D I'm looking for a list of Linux and Unix commands for performance monitoring and a good sight or area on this site that would have man pages and or information on those commands..... Thanks if anyone can take the time to post..... :cool: (14 Replies)
Discussion started by: harrisjl
14 Replies

7. UNIX for Dummies Questions & Answers

Performance monitoring

Hello, I am trying to find a way to view current CPU and disk usage. I used to use nmon which worked fine but since an upgrade to our servers this is no longer available. I have tried to get it reinstalled to no avail! Are there any other commands you can use within unix which will allow me... (4 Replies)
Discussion started by: johnwilliams
4 Replies

8. UNIX for Advanced & Expert Users

Performance Monitoring

Hi all The place I work for is about to to place there database server under heavy load for testing and would like the effect recorded as much as possible. Can anyone point me in the right direction with respect to real time system monitoring. I am aware of of 'sar', vmstat etc and hope to... (2 Replies)
Discussion started by: silvaman
2 Replies

9. AIX

Performance monitoring

Hi All I am looking for a script that would collect statistics in a summarised format. CPU, Memory,Swap, Wait queue, Run queue and disk activity. Something that would allow me to profile the environment based on a 1 line output that I could run every 15 min. Thx Junaid (1 Reply)
Discussion started by: jhansrod
1 Replies

10. UNIX for Advanced & Expert Users

performance monitoring

hi, can any one tell me, is there is any way i can check the performance of my solaris 8 os on an Ent 3500. Other than top to check for the top most processes, how to make the calculations with vmstat, iostat, mpstat and nfsstat. Or is there any other tools that i can use? cheers. (3 Replies)
Discussion started by: i2admin
3 Replies
Login or Register to Ask a Question