AIX - "prevent script from running twice" issue


 
Thread Tools Search this Thread
Operating Systems AIX AIX - "prevent script from running twice" issue
# 1  
Old 10-04-2010
AIX - "prevent script from running twice" issue

I know about standard "ps ..|grep .. | grep -v grep" solution, but...
this is different issue I have encountered in several companies I worked for.
And I see this only for AIX - not HP, not Solaris, not Linux.

Korn shell script is scheduled in the background (via cron /via Tivoli Scheduler or alike)
It has logic in it to make sure only one instance running at the time using the same "ps ..|grep .. | grep -v grep". For the most of the times it works just fine.
But once in a while script detects that another instance is already running and exits.

Trying to research this, I found that when this happens, "ps" command does return two entries of the script running with different PIDs for the short period of time. Just enough actual process to fail, and "ghost" process does not do anything.

It is random enough to make it difficult to re-create at will, but it is often enough to be really annoying and make life miserable when important job fails.
# 2  
Old 10-04-2010
What about the old pid file method?
# 3  
Old 10-04-2010
When a script makes calls out to run binaries, it will "fork" and then "exec" the binary. During this time frame there will be two processes with the same name. It is shell dependant and O/S dependant as to how long this time frame is, and whether it would appear on the "ps" listing. I would suggest using the AIX System Resource Controller to allow the startup of the script, logging of pidfile and stopping of the script (man mkssys) - all of the functionality is provided for you.

I hope this helps..
This User Gave Thanks to citaylor For This Post:
# 4  
Old 10-04-2010
Thanks, that explains what I suspected. Regarding mkssys - it looks like I would need to be sysadmin to do any changes on that level.
I am regular user, so is there any easier work around for "mere mortals"?

Quote:
Originally Posted by citaylor
When a script makes calls out to run binaries, it will "fork" and then "exec" the binary. During this time frame there will be two processes with the same name. It is shell dependant and O/S dependant as to how long this time frame is, and whether it would appear on the "ps" listing. I would suggest using the AIX System Resource Controller to allow the startup of the script, logging of pidfile and stopping of the script (man mkssys) - all of the functionality is provided for you.

I hope this helps..
# 5  
Old 11-24-2010
If it pretty transient couldn't you store the PID from your "ps ..|grep .. | grep -v grep" and re-confirm /proc/$PID directory exists before taking action.
# 6  
Old 11-25-2010
I have also seen this behavior on AIX and a bit on HP-UX. You'll notice the huge amount of debugging code I've added to try and figure out what is going on. Consider adding a loop and checking several times.

I use something like:

Code:
###################################################################################
# Is this job already running
###################################################################################
[[ $VERBOSE -ge $TRUE ]] && print_status_message "STEP: Checking to see if this job is already running..." STEP

RUNNING=$FALSE
LOOP_COUNT=1
MAX_LOOP_COUNT=5

[[ $DEBUG -ge $TRUE ]] && print_status_message "DEBUG: PROGRAM_NAME=$PROGRAM_NAME" DEBUG
[[ $DEBUG -ge $TRUE ]] && print_status_message "DEBUG: STARTDIR=$STARTDIR" DEBUG
[[ $DEBUG -ge $TRUE ]] && print_status_message "\nDEBUG: my PID is: $$" DEBUG
[[ $DEBUG -ge $TRUE ]] && print_status_message "DEBUG: count of running processes..." DEBUG
[[ $DEBUG -ge $TRUE ]] && ps -ef | egrep -v "run_job|timeout|grep|$$" | grep "$PROGRAM_NAME" | grep -c $STARTDIR

sleep 1                                         # this seems to be necessary

while [[ $(ps -ef | egrep -v "run_job|timeout|grep|$$" | grep "$PROGRAM_NAME" | grep -c $STARTDIR) -gt 0 && $LOOP_COUNT -le $MAX_LOOP_COUNT ]]
do
   RUNNING=$TRUE

   print_status_message "INFORMATION: it looks like $PROGRAM_NAME is already running against $STARTDIR. Attempt $LOOP_COUNT of $MAX_LOOP_COUNT." INFORAMTION

   [[ $DEBUG -ge $TRUE ]] && print_status_message "\nDEBUG: output from ps is:" DEBUG
   [[ $DEBUG -ge $TRUE ]] && ps -ef | egrep -v "run_job|timeout|grep|$$" | grep "$PROGRAM_NAME" | grep $STARTDIR

   [[ $DEBUG -ge 2 ]] && print_status_message "\nDEBUG: output from ps is:" DEBUG
   [[ $DEBUG -ge 2 ]] && ps -ef | egrep -v "run_job|timeout|grep|$$" | grep "$PROGRAM_NAME"

   [[ $DEBUG -ge 2 ]] && print_status_message "\nDEBUG: output from ps (ps -ef | egrep -v run_job|timeout|grep|$$) is:"  DEBUG
   [[ $DEBUG -ge 2 ]] && ps -ef | egrep -v "run_job|timeout|grep|$$"

   [[ $DEBUG -ge 3 ]] && print_status_message "\nDEBUG: output from ps is:"  DEBUG
   [[ $DEBUG -ge 3 ]] && ps -ef | grep "$PROGRAM_NAME"

   ((LOOP_COUNT += 1))

   sleep $(($RANDOM % 21 + 1))                                          # random number between 1 and 21

done


if [[ $RUNNING -eq $TRUE && $LOOP_COUNT -ge $MAX_LOOP_COUNT ]]
then
   print_status_message "\nERROR: giving up after $MAX_LOOP_COUNT tries." ERROR
   RETCODE=102
   error_exit

fi

# 7  
Old 11-26-2010
To be honest i don't understand the problem. The method with the PID file (or, to show that i can use the terminology like any real computer scientist - semaphore) is a standard method of interprocess communication:

Code:
#! /bin/ksh

typeset -i iProcNr="$$"                           # our own process nr
typeset    fPIDFile="/some/procfile.pid"

if [ -f "$fPIDFile" ] ; then
     print -u2 "ERROR: script already running, exiting."
     exit 1
else
     print - "$iProcNr" > "$fPIDFile"
fi

.... here goes the rest of your code ....

rm -f "$fPIDFile"
exit 0

This should work without under any circumstance, regardless of wether the system is fast or not.

I hope this helps.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

2. Shell Programming and Scripting

Root running a script calling to scp using user "xyz" is not authenticating!

Close duplicate thread. (0 Replies)
Discussion started by: denissi
0 Replies

3. AIX

How to enable "TCP MD5 Signatures" and "https" on AIX?

I have searched many times but nothing found. Somebody help please :(:(:( (1 Reply)
Discussion started by: bobochacha29
1 Replies

4. UNIX for Advanced & Expert Users

AIX - io info get from "libperfstat" not match "iostat"

Hi, everyone. I need to write a program to get io info based on libperfstat. But the "write time" of a disk is just half of the value get from iostat. I'm confused and can't explain. Help please. How I calculate "write service time per sec": In iostat: write service... (0 Replies)
Discussion started by: jackliang
0 Replies

5. Shell Programming and Scripting

Passing username and password to a script running inside "expect" script

Hi I'm trying to run a script " abc.sh" which triggers "use.sh" . abc.sh is nothing but a "expect" script which provides username and password automatically to the use.sh script. Please find below the scripts: #abc.sh #!/usr/bin/expect -f exec /root/use.sh expect "*name*" send... (1 Reply)
Discussion started by: baddykam
1 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. AIX

Issue "Error 404" when upgrade AIX 5300-05-CSP-0000 to AIX (5300-09-02-0849)

Please read my issue! My old server using: - AIX system operating (5300-05-CSP-0000) - WebSphere 6.1.0.21 (Fix Pack 21) After I've upgraded version AIX - AIX system operating (5300-09-02-0849) - WebSphere 6.1.0.21 (Fix Pack 21) I have 1 issue when I access home page: "Error... (0 Replies)
Discussion started by: gamonhon
0 Replies

8. HP-UX

Apache w/ "Coverity Prevent" cannot start... please hellp!

Hi, I am trying to start the apache server on 11i for Coverity Prevent using following command: # cov-start-gui --datadir /qa_home/coverity/data_dir-4.3.0 cov-internal-httpd not running. For further information, see '/qa_home/coverity/data_dir-4.3.0/logs/error_log' # the error_log... (0 Replies)
Discussion started by: prits31
0 Replies

9. HP-UX

script running with "ksh" dumping core but not with "sh"

Hi, I have small script written in korn shell. When it is called from different script, its dumping core, but no core dump when we run it standalone. And its not dumping core if we run the script using "/bin/sh" instead of "ksh" Can some body please help me how to resolve this issue. ... (9 Replies)
Discussion started by: simhe02
9 Replies

10. Shell Programming and Scripting

How to prevent the pattern "^[[0m" from being written to a file ????

Hi folks, I am using a shell script to display the referred libraries names of any specified cpp code. Given below is the script: shell script "grblib" ------------------------------------------------------------------------- #!/bin/sh # get the lines having "include" pattern ... (5 Replies)
Discussion started by: frozensmilz
5 Replies
Login or Register to Ask a Question