help with counting processes, bizzare behavior


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
help with counting processes, bizzare behavior

I have a ksh script (dtksh Version M-12/28/93d on Solaris 10) that is run daily by cron and sometime hangs forever. I need to detect if there is an old copy hung before I start the new run, and if so send an email and exit the script. Here is part of the code:
Code:
#!/usr/dt/bin/dtksh
PROGNAME=$(basename $0)
req_ver=93d
if [[  ! ${.sh.version} || (${.sh.version##*/} < $req_ver) ]] ; then 
    print "$PROGNAME requires ksh $req_ver or higher" ; exit 1 ; 
fi

count=$(pgrep -z global $PROGNAME | wc -l | awk '{print $1}')
if [[  $count -gt 1 ]] ; then 
   mailx -s "Houston we have a problem" ...
   exit
fi

Here is the problem, and I am really stumped:
Even when this script is the only instance, $count still equals 2.

Debugging shows that:
pgrep emits one PID
wc -l returns " 1 " (Seven spaces, the digit 1, single space. I need to trim the spaces before the test, thus the awk command.)
and then awk returns 2

??? !!! ???

I see the same thing when using these versions of the command:
Code:
count=$(pgrep -z global $PROGNAME | wc -l | tr -d [:blank:])
count=$(ps -z global | grep $PROGNAME | grep -v grep | wc -l | awk '{print $1}')

When I test at the shell using these checks against existing processes the result is alway correct. I have checked and double checked and it seems that the digit 1 surrounded by spaces is being piped into either tr or awk, and both return the digit 2 with no spaces.

Help will be really appreciated.

Last edited by vbe; 12-14-2010 at 02:59 PM.. Reason: code tags please
# 2  
I guess following should be able to give the count of running $PROGNAME instances. Is it not?
Code:
count=$(pgrep -z global $PROGNAME | wc -l)

OR
Code:
 
count=$(ps -z global | grep $PROGNAME | grep -v grep | wc -l)

# 3  
I would expect that when the script runs:
Code:
count=$(pgrep -z global $PROGNAME | wc -l | awk '{print $1}')

The script will create a child process (thats what the round brackets do) so that it can run the pgrep, wc, awk, etc. When it creates a child process, it will create it with the same name as the parent process, ie $PROGNAME. Therefore you will always have more than one process...
# 4  
Both of these set count to 1 and not to 2, regardless that they run in a subshell:
Code:
count=$(pgrep -z global $PROGNAME | wc -l)
count=$(ps -z global | grep $PROGNAME | grep -v grep | wc -l)

It is only when I add the pipe to tr or awk that I get '2'
Code:
count=$(pgrep -z global $PROGNAME | wc -l | tr -d [:blank:])
count=$(ps -z global | grep $PROGNAME | grep -v grep | wc -l | awk '{print $1}')

---------- Post updated at 05:49 PM ---------- Previous update was at 01:46 PM ----------

I learned something but I don't understand it. Anyone explanation would be really appreciated.
I took the existing code and removed the parens, getting rid of the subshell issue, if any. Then I added a tee to each part of the pipeline, like in each of these:

pgrep -l -x -z global $PROGNAME | tee n1 | wc -l | tee n2 | awk '{print $1}'

ps -z global -e -f | grep $PROGNAME | grep -v grep | tee n3 | wc -l | tee n4 | awk '{print $1}'

Looking at the files n1, n2, n3, and n4 shows the following:

1) When I remove the first line "#!/usr/dt/bin/dtksh" and just called the executed the script by name from the command line it actually ran, and all ps commands made no matches.

2) When I remove the first line "#!/usr/dt/bin/dtksh" and sourced the script from the command line like so: " . /mnt/cplog/scripts/check_portals" it ran and matched two instances of /usr/dt/bin/dtksh.

3) When I restore the first line "#!/usr/dt/bin/dtksh" and ran the script normally it matched two copies of itself, one the parent of the other:
root 9143 9127 0 23:41:00 pts/1 0:00 /usr/dt/bin/dtksh /mnt/cplog/scripts/check_portals
root 9127 9105 0 23:40:59 pts/1 0:00 /usr/dt/bin/dtksh /mnt/cplog/scripts/check_portals

This was without the subshell-creating expressions "$(...). Why does running a script create these two instances?

I thought it was perhaps because I executed the script from the dtksh shell, and then the script forked and exec'd another dtksh shell to run in, but when I tested that theory by calling the script from tcsh instead of dtksh I got the same thing. Any ideas?

I thought I understood ksh pretty will but now I am confused.

Last edited by Scott; 12-14-2010 at 04:52 PM.. Reason: Please use code tags
# 5  
ksh generally invokes sub shells for pipes too, this is why /var/run/*.pid files are so popular for this sort of thing.

If you really want to grep the ps output, perhaps grep -v $$
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #878
Difficulty: Medium
Memory allocation is less critical in a real-time operating system (RTOS) than in other operating systems.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bizzare behavior on redirect of stdout

Oracle Linux 5.6 64-bit (derivative of RHEL) Dear Ann Landers, This is about as bizarre as anything I've ever seen. I have a little test script I've been working with. When I redirect stdout to a file, no file. Make a copy of the script to another name. Execute it and redirect stdout, and... (4 Replies)
Discussion started by: edstevens
4 Replies

2. Shell Programming and Scripting

[Solved] Error in script while counting processes

Hi all, Below is a script I'm writing and giving me error: #!/usr/bin/sh if ; then echo "Success!" else echo "Failure!" fi Normally if I do ps -ef|grep dw.sap|wc -l it gives me output of 18. So my script checks if it's greater than 17 it echoes success else failure ... (5 Replies)
Discussion started by: frum
5 Replies

3. UNIX for Dummies Questions & Answers

Launchd-owned processes unexpected behavior

Ok, so I have been struggling with this for a few days and I think I need an explanation of a few things before I go any further. I'm not sure it's possible to do what I'm trying, so before I pull my hair out, here is what I'm doing: I have written a program in LiveCode that sits on our... (2 Replies)
Discussion started by: nextyoyoma
2 Replies

4. Programming

Bizzare optimization problem

I have a C program that, for the life of me, I can't see any possible stack corruption in but for some reason corrupts a local variable to 0 when not referenced. My compiler is gcc 4.3.4. But my question's not really a code question, it's a compiler question. The glitch is weirdly specific: ... (3 Replies)
Discussion started by: Corona688
3 Replies

5. UNIX for Advanced & Expert Users

Bizzare TCP/IP problem

Hi all. I have a really really weird problem that I've been working on for days. The problem manifested as users cannot connect to our web servers via SSH when they're using our wireless network. Here's where it gets weird: - Clients from anywhere other than the wireless subnet can... (4 Replies)
Discussion started by: pileofrogs
4 Replies

6. IP Networking

Bizzare network attack?

A server I host is having very rare glitches where a file the user downloads will have incorrect contents. This almost never happens when I am looking, I caught it once and only once -- a user messaged me saying his antivirus had given him a warning about an image file downloaded from his... (2 Replies)
Discussion started by: Corona688
2 Replies

7. Programming

very bizzare file writing problem

I'm trying to write a function which opens a file pointer and writes one of the function parameters into the file, however for some reason Im getting a core dump error. The code is as below void WriteToFile(char *file_name, char *data) { FILE *fptr; /*disk_name_size is a... (10 Replies)
Discussion started by: JamesGoh
10 Replies

8. UNIX for Advanced & Expert Users

Monitoring Processes - Killing hung processes

Is there a way to monitor certain processes and if they hang too long to kill them, but certain scripts which are expected to take a long time to let them go? Thank you Richard (4 Replies)
Discussion started by: ukndoit
4 Replies

9. Shell Programming and Scripting

Counting Processes

I have a simple script that I want to notify me whenever there are anything other than one instance of a particular process running. I've always used the script: DPID_DW=$(ps -ef | grep | wc -l) if then echo "The data warehouse manager for DB is down" elif then ... (4 Replies)
Discussion started by: heprox
4 Replies

10. UNIX for Advanced & Expert Users

Bizzare (while statement)

I'm trying to use the while statement to increment a positive number, with a leading "0". when I pass it through, it seems to come out with a negative value, and all the increments remain negative. This is what I have: i=010986294184 j=010986988888 while ; do echo $i i=(($i + 1)) done... (8 Replies)
Discussion started by: Khoomfire
8 Replies

Featured Tech Videos