Shortlived Process Don't Appear in 'top' or 'ps'


 
Thread Tools Search this Thread
Operating Systems HP-UX Shortlived Process Don't Appear in 'top' or 'ps'
# 1  
Old 10-14-2008
Shortlived Process Don't Appear in 'top' or 'ps'

We are running a field specific middle tier application server on HP-UX. We've recently been experiencing performance problems with it and the database back end (Oracle on a separate HP-UX box). We resolved a few issues on the DB server (some kernel parameters to free up RAM that was extremely overutilized for the vxfs buffer cache) and it seems to be able to handle the load again. But as soon as that was resolved the problems that we saw on the middle tier came back.

Currently we're involved in a finger pointing battle with the company that makes the application server, HP and Oracle. Personally I believe the fault lies with the middle tier. We had someone from HP come in on a time and materials basis to analyze our DB and middle tier system and he said things look good in terms of the OS. Further investigation of performance data indicated that the third heaviest CPU and RAM eating process was a short script that the application server launches hundreds to thousands of times per minute. It seems like that process is intended to set some environment variables for it's child processes and nothing more. This seems like gross inefficiency to us. But we need to be able to figure out what process(es) spawn this script's process.

I found: 'UNIX95=1 ps -Hef' in order to see a rough process tree. (There isn't a port of 'pstree' from Linux is there?) But, we've discovered that the script processes never show up in our 'ps' or 'top' commands. However, the performance data gathered by HP's scripts (and Glance I think) seemed to keep track of those processes. My supervisor believes that the problem is that 'ps' and 'top' only get a snapshot of current activity and the script process is too quick to be captured. I'm not sure if that's true or not, but it seems unlikely.

So my questions are:

1. Is there a way to control how short of a period of time that 'ps' can see?
2. Is it possible that 'ps' and 'top' can't display processes that are "too short"?
# 2  
Old 10-14-2008
Is it possible to modify the shell script instead? add some small instrumentation feature to it like $PPID? So you can track it in a log file?

Look into doing something with pstat if you need a "fast" ps command. You'll have to run it with slightly elevated priority..

Last edited by jim mcnamara; 10-14-2008 at 12:41 PM..
# 3  
Old 10-14-2008
Quote:
Is it possible to modify the shell script instead? add some small instrumentation feature to it like $PPID? So you can track it in a log file?
If the shell script is editable, isn't that possible to log the pid, ppid and if needed the time for which it lives.
# 4  
Old 10-14-2008
Upon further inspection it appears that the we're dealing with a PA-RISC executable and not a script. There is a wrapper script that calls the executable. And it looks like the executable is what we want to find in the 'ps' output. They both have names that start with 'get'. But 'ps -ef | grep get' never finds any processes that match.

Is 'pstat' a command on it's own? All I found were the HP-UX function libraries in the man pages. Or are you saying we might need to build our own 'ps' command?

Quote:
Originally Posted by jim mcnamara
Is it possible to modify the shell script instead? add some small instrumentation feature to it like $PPID? So you can track it in a log file?

Look into doing something with pstat if you need a "fast" ps command. You'll have to run it with slightly elevated priority..
# 5  
Old 10-14-2008
I have to leave now (kids waiting...)
So I drop note I made in 2003 for you to read and see if it help understanding:
Bill Hassell:
Unix memory usage is a very complex process. As you mentioned, shared memory is difficult to
assign to a given process, and considering the number of different ways a process may be st
arted (cron, rpc, network client/server tasks, threads), accurately assigning memory to a si
ngle user is virtually impossible. For given user ID, you can get a rough idea (which is lik
ely all that you need) by using ps:



UNIX95= ps -u joan -o vsz,ruser,pid,args |sort -rn



This shows all processes owned by the real user joan, showing the virtual size in Kbytes in
descending order.


Mike Stroyan:
The pstat_procvm function can give you all the information you need to do that. The attached program uses a reference count of the number of processes that map each region. It recognizes unique regions by a combination of their vm.pst_space and vm.pst_vaddr.



It divides the credited size of a region by the reference count. If three processes share a memory segment then they each get billed for one third of its size. You can run the program
with either user ids or user names to look for.



The ps command is really naive about process size. It reports only the total size of text, data, and stack. It completely misses mmap, shared memory and shared libraries.

His program: (pstat_64.c)

Code:
#define _PSTAT64
#include <sys/param.h>
#include <sys/pstat.h>
#include <sys/unistd.h>
#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>
#include <pwd.h>

typedef struct shared_segment_struct {
    long long pst_space;
    long long pst_vaddr;
    int refs;
    struct shared_segment_struct *next;
} segment;

static segment *shared_segs = NULL;

void pstatvm(uid_t uid, int pid)
{
    struct pst_vm_status pst;
    int idx, count;
    long long shared_vm = 0;
    long long shared_ram = 0;
    long long private_vm = 0;
    long long private_ram = 0;

    idx=0;
    count = pstat_getprocvm(&pst, sizeof(pst), (size_t)pid, idx);
    while (count > 0) {
        switch ((long)pst.pst_type) {
            case PS_IO: break;
                /* Don't count IO space.  It really is not RAM or swap. */
            default: 
                if (pst.pst_flags & PS_SHARED) {
                    segment *s;
                    int refs = 1;
                    for (s=shared_segs; s; s=s->next) {
                        if (s->pst_space == pst.pst_space
                        && s->pst_vaddr == pst.pst_vaddr) {
                            refs = s->refs;
                            break;
                        }
                    }
                    shared_vm += (long long) pst.pst_length;
                    shared_ram += (long long)pst.pst_phys_pages/refs;
                } else {
                    private_vm += (long long) pst.pst_length;
                    private_ram += (long long)pst.pst_phys_pages;
                }
                break;
        }

        idx++;
        count = pstat_getprocvm(&pst, sizeof(pst), (size_t)pid, idx);
    }
    printf("%6d ", uid);
    printf("%6d ", pid);
    printf("%11lldK ", shared_vm*4);
    printf("%11lldK ", shared_ram*4);
    printf("%11lldK ", private_vm*4);
    printf("%11lldK ", private_ram*4);
    printf("%11.1fM\n", (shared_ram+private_ram)/256.0);
}

void pstatvm_uid(uid_t uid)
{
#define BURST ((size_t)10)

    struct pst_status pst[BURST];
    int i, count;
    int idx = 0; /* index within the context */

    /* loop until count == 0, will occur when all have been returned */
    while ((count = pstat_getproc(pst, sizeof(pst[0]), BURST, idx)) > 0)
    {
        /* got count (max of BURST) this time.  process them */
        for (i = 0; i < count; i++) {
            if (pst[i].pst_pid==0) continue; /* Can't getprocvm real pid 0 */
            if (pst[i].pst_uid==uid) {
                pstatvm(uid, pst[i].pst_pid);
            }
        }

        /*
         * now go back and do it again, using the next index after
         * the current 'burst'
         */
        idx = pst[count-1].pst_idx + 1;
    }

    if (count == -1)
        perror("pstat_getproc()");

#undef BURST
}

void pstat_refcount_all(void)
{
#define BURST ((size_t)10)

    struct pst_status pst[BURST];
    int i, count;
    int idx = 0; /* index within the context */

    /* loop until count == 0, will occur when all have been returned */
    while ((count = pstat_getproc(pst, sizeof(pst[0]), BURST, idx)) > 0)
    {
        /* got count (max of BURST) this time.  process them */
        for (i = 0; i < count; i++) {
            struct pst_vm_status vm;
            int reg, count;
            if (pst[i].pst_pid==0) continue; /* Can't getprocvm real pid 0 */
            reg=0;
            while (pstat_getprocvm(&vm, sizeof(vm), pst[i].pst_pid, reg++)) {
                segment *s;
                for (s=shared_segs; s; s=s->next) {
                    if (s->pst_space == vm.pst_space
                    && s->pst_vaddr == vm.pst_vaddr) {
                        s->refs += 1;
                        break;
                    }
                }
                if (!s) {
                    s = (segment *) malloc(sizeof(segment));
                    s->pst_space = vm.pst_space;
                    s->pst_vaddr = vm.pst_vaddr;
                    s->refs = 1;
                    s->next = shared_segs;
                    shared_segs = s;
                }
                reg++;
            }
        }
        idx = pst[count-1].pst_idx + 1;
    }

    if (count == -1)
        perror("pstat_getproc()");

#undef BURST
}

int main(int argc, char *argv[])
{
    int i;
    pstat_refcount_all();
    printf("   uid    pid    shared_vm   shared_ram   private_vm  private_ram      Res_Mem\n
");
    if (argc > 1) {
        for (i=1; i<argc; i++) {
            struct passwd *p = getpwnam(argv[i]);
            if (p) {
                pstatvm_uid(p->pw_uid);
            } else {
                pstatvm_uid(atoi(argv[i]));
            }
        }
    } else {
        pstatvm_uid(getuid());
    }
    return 0;
}

# 6  
Old 10-14-2008
To answer your questions, yes it is possible for short-lived processes to never show up. In fact it is very unlikely that a short lived process will show up in ps or top. Those programs read the process table with techniques that are almost as fast as a memory to memory data move. Once they have this snapshot, they prepare a report. top repeats this every n seconds, but top presents you with a program...it does not want to show all processes, just the "top" ones. You have a better shot with ps. A short-lived process can be gone in well under a tenth of a second.... let's say that yours is lasting exactly one tenth of a second... That means that ps must capture a process table snapshot sometime during that tenth of a second. This is very hard to arrange in a reliable fashion. A very clever wrapper program that runs both ps and your program nearly simultaneously might be able to do it.

But if the program shows up in glance but not ps/top, I tend to suspect something else. It could be that the program is destroying it's command line. You might try: ps -el and see if it shows up.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Kill top 5 memory uses process

Hi All, how to kill 5 top memory used process in my hp-ux. Thanks, Kki (9 Replies)
Discussion started by: kki
9 Replies

2. Shell Programming and Scripting

Discovering TOP process on virtual machine

Hello, on my openvz server, i can output load averages of containers: Code: # vzlist -o laverage,ctid -H 0.00/0.00/0.00 130 0.10/0.10/0.10 150 2.26/2.28/2.28 190please which command/script to use so it outputs top 1 or 2 processes on the linux system with 2.26 laverage? i mean, i want... (1 Reply)
Discussion started by: postcd
1 Replies

3. Red Hat

How to find memory taken by a process using top command?

I wanted to know how to find the memory taken by a process using top command. The output of the top command is as follows as an example: Mem: 13333364k total, 13238904k used, 94460k free, 623640k buffers Swap: 25165816k total, 112k used, 25165704k free, 4572904k cached PID USER ... (6 Replies)
Discussion started by: RHCE
6 Replies

4. Shell Programming and Scripting

kill process from a file or directly with top

i have edited a script to kill an exact mysql process is causing the high load on the server, my problem is, kill dont kill it! script: #!/bin/sh top -n 1 -u mysql | grep mysqld | awk '{print $1}' > pid proc='cat pid' kill -9 $proc or i try with kill -9 `top -n 1 -u mysql | grep mysqld... (8 Replies)
Discussion started by: chandro
8 Replies

5. Shell Programming and Scripting

Script to Monitor a Process with Top.

Hi, I have written a script to monitor a Process with the help of top command. This is my script. ====================== #!/bin/sh DATE=`date +%Y%m%d%H%M%S` HOME=/home/xmp/testing/xmp_report RADIUS_PID=`xms -xmp sh pr | grep "RADIUS.iamsp02ldv" |awk '{ print $3 }'` PSE_PID=`xms -xmp sh... (5 Replies)
Discussion started by: Siddheshk
5 Replies

6. UNIX for Advanced & Expert Users

to understand stopped process in top

Hi, top process is shows like this in solaris server oracle 8i running: load averages: 5.01, 3.35, 2.82 18:24:45 344 processes: 332 sleeping, 5 running, 2 stopped, 5 on cpu CPU states: 22.2% idle, 29.6% user, 14.7% kernel, 33.5% iowait, 0.0% swap... (3 Replies)
Discussion started by: prakash.gr
3 Replies

7. Shell Programming and Scripting

Please help with Top and SIZE of process

Hi, what I want to do is get the SIZE of a particular process from top into a shell script so I can put it in a while loop. I want to display a warning message when the process size gets up to a certain amount, but I don't know how to get that one line spit out from Top and thrown into my shell... (5 Replies)
Discussion started by: satraver
5 Replies

8. UNIX for Advanced & Expert Users

Top running process

Hi, I have an oracle process running on top for a week now, but I couldnt see the same process with in oracle. how do I know what this process is? -GK P.S: when I say i didn't see within oracle, what I mean is I didn't see this process through oracle utility which shows all the oracle process (1 Reply)
Discussion started by: caprikar
1 Replies

9. UNIX for Dummies Questions & Answers

top shows stopped process

When I run the top command, it shows 1 process as being Stopped. This is not a zombie, but simply a stopped process. Unfortunately, I can't figure out how to tell which process this is, nor why it is in a stopped state? Any way of finding this out? (7 Replies)
Discussion started by: IrishRogue
7 Replies
Login or Register to Ask a Question