How to find blocked process in vmstat?


 
Thread Tools Search this Thread
Operating Systems Solaris How to find blocked process in vmstat?
# 1  
Old 02-27-2016
How to find blocked process in vmstat?

Hi,

How to find which processes are blocked?

b column in vmstat shows higher values some times(approximately 30 min)

Code:
bash-3.2# vmstat 1 10
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m1 m1 m1 m2   in   sy   cs us sy id
 0 21 0 92098552 26672488 773 62 6517 61 65 0 1 9 7 7 0 17276 280262 17159 12 4 83
 0 252 0 95775176 31041824 42 88 0 0 0 0 0  0  0  0  0 6424 588827 6825 87 4 9
 1 251 0 95774904 31041776 18 30 0 0 0 0 0  0  0  0  0 7672 583542 8216 86 4 9
 0 252 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 7114 554322 7635 87 4 9
 0 251 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 5659 476353 6102 86 4 10
 1 250 0 95774904 31041776 18 18 0 0 0 0 0  0  0  0  0 5812 538128 6325 86 4 10
 0 252 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 6220 506853 6523 86 4 10
 0 252 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 6992 551186 7507 87 4 9
 0 251 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 5842 593352 6284 86 4 10
 1 251 0 95774904 31041776 17 17 0 0 0 0 0  0  0  0  0 6709 524933 7224 86 4 10

During the time I have collected iostat -xpnzM 1 output

Code:
                 extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    2.0  671.4    0.0    1.3 219.9 32.0  326.6   47.6 100 100 c5t60060E801013B1F0058B3AEF0000000Cd0s0
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 c5t60060E801013B1F0058B3AEF0000006Cd0s0
    2.0  671.3    0.0    1.3 219.9 32.0  326.6   47.6 100 100 c5t60060E801013B1F0058B3AEF0000000Cd0
    2.0  671.3    0.0    1.3  0.0 251.9    0.0  374.2   0 100 md/rptsystemdb/d2000
    2.0  671.3    0.0    1.3  0.0 251.9    0.0  374.2   0 100 md/rptsystemdb/d2001
    2.0  671.3    0.0    1.3  0.0 251.7    0.0  373.9   0 100 md/rptsystemdb/d2003
    0.0    0.0    0.0    0.0  0.0  0.2    0.0    0.0   0  25 md/rptsystemdb/d2002
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 md/rptsystemdb/d3000
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 md/rptsystemdb/d3001
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 md/rptsystemdb/d3005
    0.0   46.1    0.0    0.1  0.0 23.2    0.0  503.7   0 100 md/s3rpt-d/d7131
    0.0   46.1    0.0    0.1  0.0 23.2    0.0  503.7   0 100 md/s3rpt-d/d7132
    0.0   46.1    0.0    0.1  0.0 23.2    0.0  503.7   0 100 md/s3rpt-d/d7138
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    1.0   0   0 md/s3rpt-l/d2999
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    1.0   0   0 md/s3rpt-l/d2000
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    1.0   0   0 md/s3rpt-l/d2005
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    1.0   0   0 c5t60060E801013B1F0058B3AEF000004FEd0s0
    0.0   46.0    0.0    0.1  2.6 20.5   57.3  444.6  26 100 c5t60060E801013B1F0058B3AEF000004F8d0s0
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    1.0   0   0 c5t60060E801013B1F0058B3AEF000004FEd0
    0.0   46.0    0.0    0.1  2.6 20.5   57.2  444.5  26 100 c5t60060E801013B1F0058B3AEF000004F8d0
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 c5t60060E801013B1F0058B3AEF0000006Cd0


Last edited by Don Cragun; 02-27-2016 at 10:44 AM.. Reason: Add CODE and ICODE tags.
# 2  
Old 02-27-2016
All I can do is to guess with this kind of performance data. The asvc_t column has some really out of bounds values. Hopefully those disks with high numbers are not NFS.

You have several disks that, given the reads and writes number, are nowhere near busy enough to incur asvc_t times like that. It looks like you have databases locking resources. Or locking from something. Or NFS causing problems.

1. is this a db server?
2. are you running large reports during the times when interactive processes are also busy updating?
3. Can you try the
Code:
 lockfs

command and get what it shows on those several disks? - I am assuming ufs. This will report locks. Please post the result.

I think that implementing this: DTrace Topics Locks - Siwiki might be hard to do, based on the way you asked your question, so I am going with lockfs. If the link seems like doing it will be a piece of cake, go for it by all means.
# 3  
Old 02-28-2016
It looks like you're running Solaris Volume Manager with two RAID-1 mirrors that you're heavily oversubscribing by hitting them with an extreme number of very small random write operations. What are the actual physical disks you're writing too? They look like fiber-channel give the name format.

Look under /usr/demo/dtrace. Depending on the Solaris version you're running, you may have an whoio.d dtrace script there. I've duplicated it here:

Code:
/*
 * Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 *
 * This D script is used as an example in the Solaris Dynamic Tracing Guide
 * wiki in the "io Provider" Chapter.
 *
 * The full text of the this chapter may be found here:
 *
 *   https://wikis.oracle.com/display/DTrace/io+Provider
 *
 * On machines that have DTrace installed, this script is available as
 * whoio.d in /usr/demo/dtrace, a directory that contains all D scripts
 * used in the Solaris Dynamic Tracing Guide.  A table of the scripts and their
 * corresponding chapters may be found here:
 *
 *   file:///usr/demo/dtrace/index.html
 */

#pragma D option quiet

io:::start
{
    @[args[1]->dev_statname, execname, pid] = sum(args[0]->b_bcount);
}

END
{
    printf("%10s %20s %10s %15s\n", "DEVICE", "APP", "PID", "BYTES");
    printa("%10s %20s %10d %15@d\n", @);
}

Copy that to a file, such as whoio.d, and run it as root:

Code:
dtrace -s whoio.d

Let it run a while, then hit CTRL-C to see what processes are hitting which device with IO requests. You'll have to translate names such as "sd4" to actual devices.
# 4  
Old 02-29-2016
Hi jim mcnamara,

The devices are not NFS mounted.

1. is this a db server?

Yes. It's Sybase server.

2. are you running large reports during the times when interactive processes are also busy updating?

This need to be checked.

3. Can you try the lockfs

Should I run with any options?

@ achenle

I will execute dtrace and update when I see issue next time..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Find parent process (not process ID)

Hi: I have a program written in FORTRAN running on AIX platform. It is because missing of documentation and without root password, therefore we want to modify the program so that we can find out which script/program that call this FORTRAN program. I have google for few days, all of them are... (3 Replies)
Discussion started by: cstsang
3 Replies

2. UNIX for Advanced & Expert Users

Blocked process and preempt

Hello Everyone, There is a column kthr:b in vmstat. How a process become blocked? If the process calls system call pread and sleeps inside it, is the process blocked in that moment? How a process sleeping because it has no work at all (as Notepad when we are not using it) differs from a... (3 Replies)
Discussion started by: sant
3 Replies

3. Solaris

Cant find Unix process with ps -ef

Hi All, Heres a little background. We have essbase installed on a solaris server. We are running a report script. The script starts and runs ok.. 1. after some time if i do the ps -ef i can see the process, and it generally completes successfully. 2. Most of the times, when i do the ps with... (2 Replies)
Discussion started by: noufalshaw
2 Replies

4. UNIX for Dummies Questions & Answers

Process Killed : Need to find why ?

Hi reader, I'm making a tool out of korn shell script that is running on a HP-UX server. But everytime i invoke the tool, it gets killed after a while (mid-process). I have tried re-running it a couple of times but each invocation ending up the same way .. following is a snippet of the o/p... (8 Replies)
Discussion started by: clakkad
8 Replies

5. Shell Programming and Scripting

Find PID for a process

I want to kill a process run by a user of another group. How do I do that..? (3 Replies)
Discussion started by: Haimanti
3 Replies

6. UNIX for Dummies Questions & Answers

How to find out how much RAM that process is using

H:confused:ow to find out how much RAM that process is using. like how much memory java.exe process is consuming (3 Replies)
Discussion started by: redlotus72
3 Replies

7. Shell Programming and Scripting

vmstat returns good val for cpuIdle put ps shows no active process

hi i'm running a shell script that checks the amount of cpu idle either using /usr/bin/vmstat 1 2 or sar 1 2 (on unixware) before i run some tests(if cpu idle greater than 89 I run them). These tests are run on many platforms, linux(suse, redhat) hp-ux, unixware, aix, solaris, tru64. ... (5 Replies)
Discussion started by: OFFSIHR
5 Replies

8. HP-UX

How to find memory used by a process

Hi, Can anyone help me out in writing the shell scrip which monitors a process which is running and gives me the output of the memory being used by the process, I have the requirement of monitorig the memory usage of the process when it is running. Please help me out (3 Replies)
Discussion started by: vijayagiri
3 Replies

9. Shell Programming and Scripting

how to find the chid process id from given parent process id

how to find the chid process id from given parent process id.... (the chid process doesnot have sub processes inturn) (3 Replies)
Discussion started by: guhas
3 Replies

10. IP Networking

BitTorrent port 6969 blocked... how to get around the blocked port

Due to the massive Upload speeds killing .... or overstressing our schools network...... my school has blocked port 6969 (the most common BitTorrent port). So I cant connect to the tracker anymore, in other words no more downloading from school :( Does anyone know how I can get around the ports... (1 Reply)
Discussion started by: PenguinDevil
1 Replies
Login or Register to Ask a Question