Truss output interpretation


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Truss output interpretation
# 1  
Old 10-17-2014
Truss output interpretation

hi, anyone can help on this piece of truss output?

Code:
8094:    0.7028 write(4, 0x0043BE90, 236)                       = 236
8094:      T S H  \0\0\0EC020101\0\0\0\0\0\0\0\0\0 "02\0\0 303\0\0 I D
8094:     \f %\0\0\0\0 2\0F67F\0\0\0\0 @06FFC99A ; 
8094:      L              D6\0 303 
8094:                                                             01\0\0\0
8094:      j\0\0\0\0FF\0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\0\0\0\0\0\0\n\0\0\0
8094:     02\0\0\09E1F\0\001\0\0\001\0\0\0 M Q C C 0 7 0 5 0 0 0 3\0\0\0\0
8094:     \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
8094:     \0\0\0\0\0\0\0\0\0\0\0\0
8094:    0.7029 read(4, 0x0043BE90, 32766)                      Err#11 EAGAIN
8094:   pollsys(0xFFFFFD7FFFDEE080, 1, 0xFFFFFD7FFFDEDF10, 0x00000000) (sleeping...)
8094:   20.7132 pollsys(0xFFFFFD7FFFDEE080, 1, 0xFFFFFD7FFFDEDF10, 0x00000000) = 0
8094:   pollsys(0xFFFFFD7FFFDEE080, 1, 0xFFFFFD7FFFDEDF10, 0x00000000) (sleeping...)
8094:   25.0182 pollsys(0xFFFFFD7FFFDEE080, 1, 0xFFFFFD7FFFDEDF10, 0x00000000) = 1
8094:   25.0188 read(4, 0x0043BE90, 32766)                      = 236
8094:      T S H  \0\0\0EC020102\0\0\0\0\0\0\0\0\0 "02\0\0 303\0\0 I D
8094:     \f %\090\0\0 2\0F67F\0\0\0\0 @\0FFC99A ; 
8094:      L              D6\0 303
8094:                                                             \n\0\0\0
8094:     \0\0\0\0\0FF\0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\0\0\0\0\0\0\n\0\0\0
8094:     02\0\0\0 k *\0\08902\0\0D301\0\0 M Q M M 0 7 0 5 0 0 0 3 X P F X
8094:      X X X X _ 2 0 1 4 - 0 9- 3 1 _ 2 0 . 0 9 . 4 5
8094:


First, there is an EAGAIN error, goes into sleeping (poll) and then after about 20 odd seconds, it begins to read. What are the possible causes ?
thanks
# 2  
Old 10-17-2014
Code:
From the read man page:

     When attempting to read from a regular file  with  mandatory
     file/record locking set (see chmod(2)), and there is a write
     lock owned by another process on the segment of the file  to
     be read:

         o    If O_NDELAY or O_NONBLOCK is set, read() returns -1
              and sets errno to EAGAIN.

pollsys then waits until the file can be read. The only possible cause is described above. This is POSIX compliant semantics for the read call. pollsys is a Solaris-ism (a system call entry point) for poll() and select(), I believe. Don Cragun would know definitely.
# 3  
Old 10-18-2014
Quote:
Originally Posted by jim mcnamara
Code:
         o    If O_NDELAY or O_NONBLOCK is set, read() returns -1
              and sets errno to EAGAIN.

pollsys then waits until the file can be read. The only possible cause is described above. This is POSIX compliant semantics for the read call. pollsys is a Solaris-ism (a system call entry point) for poll() and select(), I believe. Don Cragun would know definitely.
hi, thanks for the help.
This pertains to a networking process rather than file i/o process. One of my networking application is trying to connect to a server, and this happens. Below is truss output of what happens prior to the wait..

Code:
8094:    0.6895 open("/etc/netconfig", O_RDONLY|O_LARGEFILE)    = 4
8094:    0.6896 time()                                          = 1413538243
8094:    0.6901 rctlsys(0, "process.max-file-descriptor", 0x00000000, 0x00428020, 0, 0) = 0
8094:    0.6902 read(4, 0x00421524, 1024)                       = 1024
8094:      # p r a g m a   i d e n t\t " @ ( # ) n e t c o n f i g\t 1 . 1
8094:      6\t 9 9 / 1 0 / 2 5   S M I "\n #\n #   T h e   " N e t w o r k
.....

8094:      / t i c o t s           s t r a d d r . s o\n
8094:    0.6904 read(4, 0x00421524, 1024)                       = 0
8094:    0.6905 lseek(4, 0, SEEK_SET)                           = 0
8094:    0.6905 read(4, 0x00421524, 1024)                       = 1024
8094:      # p r a g m a   i d e n t\t " @ ( # ) n e t c o n f i g\t 1 . 1
8094:      6\t 9 9 / 1 0 / 2 5   S M I "\n #\n #   T h e   " N e t w o r k
8094:        C o n f i g u r a t i o n "   F i l e .\n #\n #   E a c h   e
8094:      n t r y   i s   o f   t h e   f o r m :\n #\n #               <
8094:      n e t w o r k _ i d >   < s e m a n t i c s >   < f l a g s ......

8094:      / t i c o t s           s t r a d d r . s o\n
8094:    0.6907 read(4, 0x00421524, 1024)                       = 0
8094:    0.6908 close(4)                                        = 0
8094:    0.6909 open("/dev/udp", O_RDONLY)                      = 4
8094:    0.6910 ioctl(4, SIOCGLIFNUM, 0xFFFFFD7FFFDEDC90)       = 0
8094:    0.6911 close(4)                                        = 0
8094:    0.6911 brk(0x0043AAB0)                                 = 0
8094:    0.6912 brk(0x0043EAB0)                                 = 0
8094:    0.6915 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4
8094:    0.6916 ioctl(4, FIONBIO, 0xFFFFFD7FFFDEDCB0)           = 0
8094:    0.6924 connect(4, 0xFFFFFD7FFFDEE310, 16, SOV_DEFAULT) = 0
8094:    0.6925 getsockname(4, 0xFFFFFD7FFFDEE340, 0xFFFFFD7FFFDEE114, SOV_DEFAULT) = 0
8094:    0.6927 ioctl(4, FIONBIO, 0xFFFFFD7FFFDEE10C)           = 0
8094:    0.6928 setsockopt(4, tcp, TCP_NODELAY, 0xFFFFFD7FFFDEE10C, 4, SOV_DEFAULT) = 0
8094:    0.7024 brk(0x0043EAB0)                                 = 0
8094:    0.7025 brk(0x00446AB0)                                 = 0
8094:    0.7027 time()                                          = 1413538243
8094:    0.7028 write(4, 0x0043BE90, 236)                       = 236
8094:      T S H  \0\0\0EC020101\0\0\0\0\0\0\0\0\0 "02\0\0 303\0\0 I D
8094:     \f %\0\0\0\0 2\0F67F\0\0\0\0 @06FFC99A ; 
8094:      L              D6\0 303 X X X X X X
8094:                                                             01\0\0\0
8094:      j\0\0\0\0FF\0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\0\0\0\0\0\0\n\0\0\0
8094:     02\0\0\09E1F\0\001\0\0\001\0\0\0 M Q C C 0 7 0 5 0 0 0 3\0\0\0\0
8094:     \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
8094:     \0\0\0\0\0\0\0\0\0\0\0\0
8094:    0.7029 read(4, 0x0043BE90, 32766)                      Err#11 EAGAIN

Is there any network tcp parameters that I can fine tune with regards to EAGAIN errors? My guess is network is slow (may be due to intermediate firewalls/routers etc), hence not able to read in time? Not really sure how to go on from here. Wondering also if performing a traceroute would do any good. any tips on troubleshooting this kind of errors greatly appreciated.
thanks
# 4  
Old 10-18-2014
OK. So you're sending a request to a server on a socket and immediately after sending the request, you try to read a response from the server with NDELAY set in the socket options. There is no server and no network that can respond that fast to a request.

There are a couple of obvious things you could try:
  1. Drop the TCP_NDELAY socket option so the read() will wait for data instead of returning immediately if no data is present.
  2. Drop the 1st read() and start with the poll() or pollsys() to wait for data to be present before attempting the read().
I haven't tried to evaluate the arguments to pollsys() to see if your program is waiting for data on a group of file descriptors or just waiting for data on fd #4. If it is just waiting for data on fd #4, I would start by trying #1; but if your program can continue processing if data is available on another file descriptor as well, choose option 2.

With what you have shown us there is no way for us to guess why it is taking more than 24 seconds for the server to respond to your request.
# 5  
Old 10-18-2014
Quote:
Originally Posted by Don Cragun
OK. So you're sending a request to a server on a socket and immediately after sending the request, you try to read a response from the server with NDELAY set in the socket options. There is no server and no network that can respond that fast to a request.

There are a couple of obvious things you could try:
  1. Drop the TCP_NDELAY socket option so the read() will wait for data instead of returning immediately if no data is present.
  2. Drop the 1st read() and start with the poll() or pollsys() to wait for data to be present before attempting the read().
I haven't tried to evaluate the arguments to pollsys() to see if your program is waiting for data on a group of file descriptors or just waiting for data on fd #4. If it is just waiting for data on fd #4, I would start by trying #1; but if your program can continue processing if data is available on another file descriptor as well, choose option 2.

With what you have shown us there is no way for us to guess why it is taking more than 24 seconds for the server to respond to your request.

hi, thanks for reply
I have setup my own client and server on two test Solaris VMs, and my client app is able to connect to server and process data just fine. The difference is that, the test network has only one routing path and there are no intermediate routers/firewalls etc in between my client and server.

However, at my workplace, the environment is different. my production client and server VMs have multiple interfaces and so have many different routes. I am beginning to suspect it could be routing mis configuration that result in non optimal routes, or intermediate firewalls/switches causing the delay. So far, all i could think of is comparing the tcp and ip settings ( using ndd ) to that of production and see what can be fine tuned.

The client and server apps are proprietary, so I have no way to change any code.
thanks
# 6  
Old 10-18-2014
This may sound obvious, but consider traceroute. A 20+ second delay almost surely indicates a grossly overloaded system, somewhere in the path from A to B. You cannot do much about it until you locate it. Ignore my suggestion if you are routing over lots of DSL lines, in which case you are stuck.

We had remote printers on the other side of DSL lines from their print servers. Printing was painfully slow. Created local remote print servers, the problem went away.
# 7  
Old 10-18-2014
Quote:
Originally Posted by jim mcnamara
This may sound obvious, but consider traceroute. A 20+ second delay almost surely indicates a grossly overloaded system, somewhere in the path from A to B. You cannot do much about it until you locate it. Ignore my suggestion if you are routing over lots of DSL lines, in which case you are stuck.

We had remote printers on the other side of DSL lines from their print servers. Printing was painfully slow. Created local remote print servers, the problem went away.
hi jim, thanks. yes i have requested traceroute restriction to be lifted so we can troubleshoot this problem. I have no say over network security. the productions VMs are running in a cloud platform hence I don't think there are DSL lines involved.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

unable to understand the output of TRUSS command

Hi, I am trying to set ulimit for soft stack unlimited, but this is not taking effect, after tracing the ulimit -a unlimited command, the below output was generated, which i am unable to comprehend. Could any one help me with this? prcbap1-r10prod: truss -d ulimit -s unlimited Tue Dec 30... (2 Replies)
Discussion started by: NasirAbbasi
2 Replies

2. UNIX for Advanced & Expert Users

Truss output

Hi, I want to trace a background java program which runs in infinite loop. I have used truss command for this. But the program terminated after some hours with below truss output: Received signal #1, SIGHUP, in lwp_cond_wait() /1: siginfo: SIG#0 Please let me know what... (5 Replies)
Discussion started by: Hara Prasad
5 Replies

3. UNIX for Dummies Questions & Answers

Truss output

Hi, I want to trace a background java program which runs in infinite loop. I have used truss command for this. But the program terminated after some hours with below truss output: Received signal #1, SIGHUP, in lwp_cond_wait() /1: siginfo: SIG#0 Please let me know what... (0 Replies)
Discussion started by: Hara Prasad
0 Replies

4. AIX

Truss output

Hello, I'm using AIX 5.3.12.5 and trying to understand truss output. I'm running a job with real time of 16 minutes but only 4 minutes of CPU time. I'm trying to understand what the process is doing. I'm getting a lot of kread, kpread, kwrite, kpwrite... localhost:~ x$ grep... (1 Reply)
Discussion started by: Kovacs
1 Replies

5. AIX

lspath output interpretation

On my VIo I see the following for my disks: $ lspath | grep hdisk6 Enabled hdisk6 fscsi0 200600a0b82193f7,4000000000000 Enabled hdisk6 fscsi0 200700a0b82193f7,4000000000000 Enabled hdisk6 fscsi2 200600a0b82193f8,4000000000000 Failed hdisk6 fscsi2 200700a0b82193f8,4000000000000 $ lspath |... (8 Replies)
Discussion started by: petervg
8 Replies

6. Shell Programming and Scripting

need help with truss !

i have to gather some info about a process and redirect it to a1.txt file. For this i m using truss command truss -po a1.txt $PID_Detail where $PID_Detail= 1482944 3362976 -------------------------------------------------------------------------- Below the script: #!/bin/ksh for i... (6 Replies)
Discussion started by: ali560045
6 Replies

7. Shell Programming and Scripting

truss output

Hi I have to get redirect the truss ouput to file. I am doing truss -p 12121 >> output.txt But it still displays on the screen adn output.txt is empty Can some help how to do this? Thanks in advance Ammu (1 Reply)
Discussion started by: ammu
1 Replies

8. Shell Programming and Scripting

how to use truss?

Hi all, while trying to debug and figure out why a lofiadm command was not working on my script, i came across a cmd called "truss" all i know about it is that it executes the specified command and produces a trace of the system calls it performs, the signals it receives, and the machine faults... (5 Replies)
Discussion started by: wrapster
5 Replies

9. UNIX for Advanced & Expert Users

truss output

Hi, We keep getting hanging Oracle process for our ETL. The dba's asked me to do o truss. All I see is pages and pages of the following,:cool: pollsys(0xFFFFFFFF7FFF38C8, 1, 0xFFFFFFFF7FFF3800, 0x00000000) = 0 pollsys(0xFFFFFFFF7FFF38C8, 1, 0xFFFFFFFF7FFF3800, 0x00000000) = 0... (0 Replies)
Discussion started by: happyadm
0 Replies

10. UNIX for Advanced & Expert Users

truss

I'm a DBA so no abuse please! I've for 5 Oracle Forms processes that are spinning and am trying to find out if they're doing anything: Running HPUX 11.11 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND 9 ? 2735 oracle 241 20 24228K 16668K run 2607:29 84.92... (2 Replies)
Discussion started by: fraze
2 Replies
Login or Register to Ask a Question