Wait process holding CPU


 
Thread Tools Search this Thread
Operating Systems AIX Wait process holding CPU
# 1  
Old 02-26-2013
[SOLVED] Wait process holding CPU

Hi all,

Have this performance Issue,

[
Code:
srvbd1]root]/]>ps vg | head -1 ; ps vg | grep -w wait
    PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
   8196      - A    4448:23    0   384   384    xx     0     0 12.8  0.0 wait
  53274      - A    4179:28    0   384   384    xx     0     0 12.1  0.0 wait
  57372      - A    4436:05    0   384   384    xx     0     0 12.8  0.0 wait
  61470      - A    4173:05    0   384   384    xx     0     0 12.0  0.0 wait
[srvbd1]root]/]>ps -ef | grep 8196| grep -v grep
[srvbd1]root]/]>

There are 4 "wait" commands and it occupies like 50 % of CPU, as showed by ps aux

Code:
[srvbd1]root]/]>ps aux | head -1; ps aux | sort -rn +2 | head -5
USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
root      57372 12.8  0.0  384  384      - A      Feb 20 4437:22 wait
root       8196 12.8  0.0  384  384      - A      Feb 20 4449:41 wait
root      53274 12.1  0.0  384  384      - A      Feb 20 4180:41 wait
root      61470 12.0  0.0  384  384      - A      Feb 20 4174:17 wait
fin102   299090  0.2  0.0 1992 1976      - A    09:19:01  0:42 /u02/F10204/UBS/
[srvbd1]root]/]>

Please help me killing these wait process, as they are not real processes. Help would be greatly appreciated. Server performance is very poor, even login takes hell lotta time.

Moderator's Comments:
Mod Comment edit by bakunin: which part of "please use CODE-tags" was so hard to understand?

Last edited by bakunin; 02-26-2013 at 12:38 PM..
# 2  
Old 02-26-2013
I see no "performance issue", just a "ps"-output. To assess the performance situation of your system it would be necessary to the output of:

Code:
vmstat -v
vmstat -tw 1
svmon -G
iostat 5
no -a

and, depending on the configuration of your system ("lscfg") probably some other.

Anyways, to kill the processes is easy. You see the columns labeled PID in your output:

Code:
kill -15 <pid>

then wait a few seconds, issue another "ps". If <pid> isn't gone:

Code:
kill -9 <pid>

I still have serious doubts that this will help your situation any and i fear it might make you situation even worse, but there you go. My recommendation is not to do it, but you are free to do as you please.

I hope this helps.

bakunin
# 3  
Old 02-26-2013
Wait process holding CPU

Hi Bakumin,

Thanks for your reply. Let me explain the issue with me right now. The server is completely empty, but still any application i start like WAS 'or' enterprise application is very slow like takes hours together. Even putty login takes like few minutes to login. So we analyzed and found only this wait process looked like bottlenect. But i m not sure, this being kernel process, i m not able to kill them.

Here i post the required details, please do review and let me know if you can find any reason for the server behaviour.



Code:
[srvbd1]root]/]>proctree 8196
[srvbd1]root]/]>        kill -15 8196
kill: 8196: 0403-003 The specified process does not exist.
[srvbd1]root]/]>ps -fk | grep wait
    root   8196      0   0   Feb 20      - 4479:28 wait
    root  53274      0   0   Feb 20      - 4208:33 wait
    root  57372      0   0   Feb 20      - 4466:54 wait
    root  61470      0   0   Feb 20      - 4201:55 wait
[srvbd1]root]/]>vmstat -v
              2035712 memory pages
              1957145 lruable pages
              1052819 free pages
                    1 memory pools
               384893 pinned pages
                 80.0 maxpin percentage
                 20.0 minperm percentage
                 80.0 maxperm percentage
                 13.3 numperm percentage
               260427 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 13.2 numclient percentage
                 80.0 maxclient percentage
               260187 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                 1019 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf
                    0 Virtualized Partition Memory Page Faults
                 0.00 Time resolving virtualized partition memory page faults
[srvbd1]root]/]>vmstat -tw 1

System configuration: lcpu=4 mem=7952MB

 kthr          memory                         page                       faults           cpu       time
------- --------------------- ------------------------------------ ------------------ ----------- --------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa hr mi se
  0   0     702600    1052811     0     0     0     0      0     0     2   6268  7339  0  1 99  0 11:52:31
  0   0     702602    1052809     0     0     0     0      0     0     4   5902  7045  0  1 99  0 11:52:32
  0   0     702602    1052809     0     0     0     0      0     0     5   5991  6883  0  1 99  0 11:52:33
  0   0     702602    1052809     0     0     0     0      0     0     4   5913  6100  0  1 99  0 11:52:34
[srvbd1]root]/]>
[srvbd1]root]/]>
[srvbd1]root]/]>svmon -G
               size      inuse       free        pin    virtual
memory      2035712     982932    1052780     384894     702631
pg space    2097152       2404

               work       pers       clnt      other
pin          314839          0          0      70055
in use       702631        240     280061

PageSize   PoolSize      inuse       pgsp        pin    virtual
s   4 KB          -     935236       2404     361214     654935
m  64 KB          -       2981          0       1480       2981
[srvbd1]root]/]>iostat 5

System configuration: lcpu=4 drives=3 paths=2 vdisks=0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.0         11.6                0.3   0.7   98.9      0.2

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           2.0       3.2       0.4          0        16
hdisk1           2.0       6.4       0.8          0        32
cd0              0.0       0.0       0.0          0         0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.0         77.6                0.3   1.5   97.9      0.3

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.2      11.0       2.4          0        56
hdisk1           0.2       7.9       1.2          0        40
cd0              0.0       0.0       0.0          0         0
[srvbd1]root]/]>
[srvbd1]root]/]>no -a
                 arpqsize = 12
               arpt_killc = 20
              arptab_bsiz = 7
                arptab_nb = 149
                bcastping = 0
      clean_partial_conns = 1
                 delayack = 0
            delayackports = {}
         dgd_packets_lost = 3
            dgd_ping_time = 5
           dgd_retry_time = 5
       directed_broadcast = 0
         extendednetstats = 0
                 fasttimo = 200
        icmp6_errmsg_rate = 10
          icmpaddressmask = 0
ie5_old_multicast_mapping = 0
                   ifsize = 256
          inet_stack_size = 16
               ip6_defttl = 64
                ip6_prune = 1
            ip6forwarding = 0
       ip6srcrouteforward = 1
       ip_ifdelete_notify = 0
                 ip_nfrag = 200
             ipforwarding = 0
                ipfragttl = 2
        ipignoreredirects = 0
                ipqmaxlen = 100
          ipsendredirects = 1
        ipsrcrouteforward = 1
           ipsrcrouterecv = 0
           ipsrcroutesend = 1
          llsleep_timeout = 3
                  lo_perf = 1
                lowthresh = 90
                 main_if6 = 0
               main_site6 = 0
                 maxnip6q = 20
                   maxttl = 255
                medthresh = 95
               mpr_policy = 1
              multi_homed = 1
                nbc_limit = 1017856
            nbc_max_cache = 131072
            nbc_min_cache = 1
         nbc_ofile_hashsz = 12841
                 nbc_pseg = 0
           nbc_pseg_limit = 2035712
           ndd_event_name = {all}
        ndd_event_tracing = 0
            ndp_mmaxtries = 3
            ndp_umaxtries = 3
                 ndpqsize = 50
                ndpt_down = 3
                ndpt_keep = 120
               ndpt_probe = 5
           ndpt_reachable = 30
             ndpt_retrans = 1
             net_buf_size = {all}
             net_buf_type = {all}
        net_malloc_police = 0
           nonlocsrcroute = 0
                 nstrpush = 8
              passive_dgd = 0
         pmtu_default_age = 10
              pmtu_expire = 10
 pmtu_rediscover_interval = 30
              psebufcalls = 20
                 psecache = 1
             pseintrstack = 24576
                psetimers = 20
           rfc1122addrchk = 0
                  rfc1323 = 1
                  rfc2414 = 1
             route_expire = 1
          routerevalidate = 0
                 rto_high = 64
               rto_length = 13
                rto_limit = 7
                  rto_low = 1
                     sack = 0
                   sb_max = 1048576
       send_file_duration = 300
              site6_index = 0
               sockthresh = 85
                  sodebug = 0
              sodebug_env = 0
                somaxconn = 1024
                 strctlsz = 1024
                 strmsgsz = 0
                strthresh = 85
               strturncnt = 15
          subnetsarelocal = 1
       tcp_bad_port_limit = 0
                  tcp_ecn = 0
       tcp_ephemeral_high = 65535
        tcp_ephemeral_low = 32768
             tcp_finwait2 = 1200
           tcp_icmpsecure = 0
          tcp_init_window = 0
    tcp_inpcb_hashtab_siz = 24499
              tcp_keepcnt = 8
             tcp_keepidle = 14400
             tcp_keepinit = 150
            tcp_keepintvl = 150
     tcp_limited_transmit = 1
              tcp_low_rto = 0
             tcp_maxburst = 0
              tcp_mssdflt = 1460
          tcp_nagle_limit = 65535
        tcp_nagleoverride = 0
               tcp_ndebug = 100
              tcp_newreno = 1
           tcp_nodelayack = 0
        tcp_pmtu_discover = 1
            tcp_recvspace = 16384
            tcp_sendspace = 262144
            tcp_tcpsecure = 0
             tcp_timewait = 1
                  tcp_ttl = 60
           tcprexmtthresh = 3
                  thewall = 4071424
         timer_wheel_tick = 0
       udp_bad_port_limit = 0
       udp_ephemeral_high = 65535
        udp_ephemeral_low = 32768
    udp_inpcb_hashtab_siz = 24499
        udp_pmtu_discover = 1
            udp_recvspace = 42080
            udp_sendspace = 9216
                  udp_ttl = 30
                 udpcksum = 1
                 use_isno = 1
           use_sndbufpool = 1
[srvbd1]root]/]>lscfg
INSTALLED RESOURCE LIST

The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
*   = Diagnostic support not available.

  Model Architecture: chrp
  Model Implementation: Multiple Processor, PCI bus

+ sys0                                             System Object
+ sysplanar0                                       System Planar
* vio0                                             Virtual I/O Bus
* vsa0             U789F.001.AAA8080-P1-T3         LPAR Virtual Serial Adapter
* vty0             U789F.001.AAA8080-P1-T3-L0      Asynchronous Terminal
* pci2             U789F.001.AAA8080-P1            PCI Bus
* pci1             U789F.001.AAA8080-P1            PCI Bus
+ fcs0             U789F.001.AAA8080-P1-C13-C1-T1  FC Adapter
* fscsi0           U789F.001.AAA8080-P1-C13-C1-T1  FC SCSI I/O Controller Protocol Device
* fcnet0           U789F.001.AAA8080-P1-C13-C1-T1  Fibre Channel Network Protocol Device
+ fcs1             U789F.001.AAA8080-P1-C13-C1-T2  FC Adapter
* fscsi1           U789F.001.AAA8080-P1-C13-C1-T2  FC SCSI I/O Controller Protocol Device
* fcnet1           U789F.001.AAA8080-P1-C13-C1-T2  Fibre Channel Network Protocol Device
* pci0             U789F.001.AAA8080-P1            PCI Bus
* pci3             U789F.001.AAA8080-P1            PCI Bus
+ ent0             U789F.001.AAA8080-P1-T1         2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
+ ent1             U789F.001.AAA8080-P1-T2         2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
* pci4             U789F.001.AAA8080-P1            PCI Bus
+ usbhc0           U789F.001.AAA8080-P1            USB Host Controller (33103500)
+ usbhc1           U789F.001.AAA8080-P1            USB Host Controller (33103500)
* pci5             U789F.001.AAA8080-P1            PCI Bus
* ide0             U789F.001.AAA8080-P1-T10        ATA/IDE Controller Device
+ cd0              U789F.001.AAA8080-P1-D3         IDE DVD-RAM Drive
* pci6             U789F.001.AAA8080-P1            PCI Bus
+ sisscsia0        U789F.001.AAA8080-P1            PCI-X Dual Channel Ultra320 SCSI Adapter
+ scsi0            U789F.001.AAA8080-P1-T5         PCI-X Dual Channel Ultra320 SCSI Adapter bus
+ scsi1            U789F.001.AAA8080-P1-T9         PCI-X Dual Channel Ultra320 SCSI Adapter bus
+ hdisk0           U789F.001.AAA8080-P1-T9-L5-L0   16 Bit LVD SCSI Disk Drive (73400 MB)
+ hdisk1           U789F.001.AAA8080-P1-T9-L8-L0   16 Bit LVD SCSI Disk Drive (73400 MB)
+ ses0             U789F.001.AAA8080-P1-T9-L15-L0  SCSI Enclosure Services Device
+ L2cache0                                         L2 Cache
+ mem0                                             Memory
+ proc0                                            Processor
+ proc2                                            Processor
[srvbd1]root]/]>kill -9 8196
kill: 8196: 0403-003 The specified process does not exist.
[srvbd1]root]/]>

# 4  
Old 02-26-2013
These are kernel wait processes. They are absolute normal and come with the OS, 1 per Logical CPU. As one can see you have 2 procs and I assume you have SMT activated with 2 Logical CPUs per virtual or physical CPU.

As Bakunin said, you should really not kill them. They are definetly not your problem. They are just waiting for work and help calculating your idle percentage. Leave them alone!
IBM CPU Utilization for the wait KPROC - United States

Either this box is very weak ressource wise, the application is programmed badly, or there is some other kind of performance problem. There can be problems with name resolution etc. whatever.

Start up the application and have something like vmstat -w 2 20 while it performs slow, to get a 1st impression of your system.
Also check the logs of your application, if it writes any.
This User Gave Thanks to zaxxon For This Post:
# 5  
Old 02-26-2013
Quote:
Originally Posted by gopeezere
Thanks for your reply. Let me explain the issue with me right now. The server is completely empty, but still any application i start like WAS 'or' enterprise application is very slow like takes hours together. Even putty login takes like few minutes to login.
OK, this might as well be a problem with the server as it might be a problem with some third-party system. A possible cause could be the name server (have a look at /etc/resolv.conf), maybe the server runs into a timeout every time it tries to query an IP address. Try the following: select a server in your network. Make sure its IP address is not in the local /etc/hosts. Do a "ping <IP-address>" and note the time it takes to respond. Now try a "ping <hostname>" for the same server. If there is a noticeable difference in how long it takes "ping" to start the name server is the culprit.

Quote:
But i m not sure, this being kernel process, i m not able to kill them.
Actually you are, but they are immediately restarted.


Quote:
Here i post the required details, please do review and let me know if you can find any reason for the server behaviour.
OK, i had a quick look at your output and IMHO the system was doing absolutely nothing when you took the snapshots, it probably rebooted just before. If you look at "vmstat"s output and notice the lots of "free" memory pages there are only two possible reasons: either the system does absolutely nothing so that the kernel doesn't even know what to put into file cache - this is unlikely given your modest memory size of ~8GB. The other option is that the system just restarted and there was not enough I/O to this moment to fill the filecache with anything that makes sense. (The last possible explanation - a rather hilarious "maxperm"- "minperm"-, etc. setting - is ruled out by the output of "vmstat -v".)

You might want to tune your maxperm- and minperm-settings to more sensible values. What these values might be depends on the application, but 95% and 3% are good starting points. Right now you have:

Code:
[srvbd1]root]/]>vmstat -v
[...]
                 20.0 minperm percentage
                 80.0 maxperm percentage
                 80.0 maxclient percentage
[...]

Code:
[srvbd1]root]/]>svmon -G
               size      inuse       free        pin    virtual
memory      2035712     982932    1052780     384894     702631
pg space    2097152       2404

This display is in memory pages (=4k). 2 Mio pages ~ 8GB. From these 2 mio pages 700k have been used, the rest is simply doing nothing. If this is everything your system ever does you could reduce its memory to ~4GB and everything would be fine.

Code:
[srvbd1]root]/]>iostat 5

System configuration: lcpu=4 drives=3 paths=2 vdisks=0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.0         11.6                0.3   0.7   98.9      0.2

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           2.0       3.2       0.4          0        16
hdisk1           2.0       6.4       0.8          0        32
cd0              0.0       0.0       0.0          0         0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.0         77.6                0.3   1.5   97.9      0.3

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.2      11.0       2.4          0        56
hdisk1           0.2       7.9       1.2          0        40
cd0              0.0       0.0       0.0          0         0

These disks are doing absolutely nothing. The little activity residue is the system itself idling away. It is the computer equivalent of one twiddling his thumbs.

Code:
[srvbd1]root]/]>no -a

Looks like everything is at defaults here. Once the system will actually do anything there might be a reason to optimize a bit, but now just leave it alone.

I wonder what you want with the many adapters - you have no disks (save for the two system disks) right now.

Summary:

It seems that the system is built right now and some of the hardware ins't even connected (like disks). The system is definitely not the problem when a "putty" eds "several minutes" to connect. I'd look at the network (routers, firewalls, VLANs, etc.) and network-related services (DNS, NIS, maybe kerberos or LDAP, etc.) if the culprit is there. My first guess would be the name server, then the other components i named.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 6  
Old 02-27-2013
Wait process holding CPU

Thanks for your detailed Analysis Bakumin & zaxxon.

As exclaimed, yes the system was doing nothing at that point in time, they were completely idle. I was either trying to login to sqlplus from other session and it was taking 2 minutes for that 'or' may be some other things very general like bring up a small service.

I m going to try all these suggessions given. Will let you know guys.

Thanks a ton for your help.
# 7  
Old 03-01-2013
another way to testif delays are caused bt name server lookups is to edit /etc/netsvc.conf.
add or edit a line so that it says,
hosts=local4

fyi my normal setting is: hosts=local4,bind4 as I am not using any IP6.
This User Gave Thanks to MichaelFelt For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to find process holding a semaphore?

Hello All, The system concerned has multiple processes communicating with each other using shared memory. These processes use semaphores to protect data being used amongst them. The "key" would uniquely identifies the particular semaphore corresponding to a resource for the various processes. ... (2 Replies)
Discussion started by: saptarshi
2 Replies

2. AIX

Wait time shows high CPU usage

Hi, I can't seem to make sense of this. My wait time is showing really high but vmstat's and topas are showing normal usage. ps aux USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 9961810 5680.7 0.0 448 384 - A Dec 16 6703072:12 wait ... (2 Replies)
Discussion started by: techy1
2 Replies

3. Solaris

Process holding /tmp space, need to know the process details

Hi , In a server /tmp has almost reached 75% and i can see the File system utilization is 48Mb only , so i believe some process is using the /tmp space. I would like to know which process is using /tmp space. # df -h /tmp Filesystem size used avail capacity Mounted on swap ... (9 Replies)
Discussion started by: chidori
9 Replies

4. Solaris

How to check which process is holding up the ilde port

HI All Am on Sun OS.While trying to start a process , we could see that the port is idle and we are not able to find the process holding that port. Below is the result we get after using netstat command. lsof command is not yet installed in our machine. netstat -a | grep "port no"... (5 Replies)
Discussion started by: Whiteboard
5 Replies

5. Shell Programming and Scripting

How to make the parent process to wait for the child process

Hi All, I have two ksh script. 1st script calls the 2nd script and the second script calls an 'C' program. I want 1st script to wait until the 'C' program completes. I cant able to get the process id for the 'C' program (child process) to make the 1st script to wait for the second... (7 Replies)
Discussion started by: sennidurai
7 Replies

6. AIX

%wait nmon CPU-UTILISATION

Hi, I collect statistics with nmon. I'm very suprised about % wait of processor. Number Of Processors: 4 Processor Clock Speed: 4204 MHz Do U have an idea about % wait ? │ 0----------25-----------50----------75----------100 ... (1 Reply)
Discussion started by: tagger
1 Replies

7. UNIX for Dummies Questions & Answers

how to get persistant cpu utilization values per process per cpu in linux (! top,ps)

hi, i want to know cpu utilizatiion per process per cpu..for single processor also if multicore in linux ..to use these values in shell script to kill processes exceeding cpu utilization.ps (pcpu) command does not give exact values..top does not give persistant values..psstat,vmstat..does njot... (3 Replies)
Discussion started by: pankajd
3 Replies

8. Shell Programming and Scripting

wait command - cat it wait for not-chile process?

Did not use 'wait' yet. How I understand by now the wait works only for child processes, started background. Is there any other way to watch completion of any, not related process (at least, a process, owned by the same user?) I need to start a background process, witch will be waiting... (2 Replies)
Discussion started by: alex_5161
2 Replies

9. UNIX for Advanced & Expert Users

86% CPU for wait

Hi, is-it normal to have 86% of CPU for wait commande : ps aux| head -20 UTIL PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 516 86,6 0,0 12 12 - A 02 nov 2088:03 wait oralfa01 54422 4,6 1,0 68044 39868 - A 09:20:06 2:27 oracleALFA01 If... (3 Replies)
Discussion started by: big123456
3 Replies

10. UNIX for Dummies Questions & Answers

Process Wait on DG UX

Does anyone know what the equivalent command to pwait on Solaris is on DG/UX. I need my script to kick off a process and wait till it is complete before continuing with the script. (4 Replies)
Discussion started by: fabbas
4 Replies
Login or Register to Ask a Question