Background process, return code and pid.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Background process, return code and pid.
# 1  
Old 08-12-2011
Background process, return code and pid.

Hey all,

Okay, this one is tricky and I'm not sure there is a niec way to do it, or indeed anyway to do it. The main issue revolves around timing out a hung ssh. I am doing this by creating a wrapper script for the ssh with the following requirements.

My requirements are:
  1. Defineable timeout period
  2. If the timeout period completes and ssh is still running then kill it.
  3. Provide a return code as if the ssh has not been run from a wrapper script.
  4. Multiple instances of this wrapper script can run at the same time.
By point 3 I mean, say the original ssh was
Code:
$REMOTE_INT_IP ls $ms_billing_dir/$reg_file

want my wrappered ssh (currently called safe_ssh) to return the return code of the ls not just whether or not the ssh completed.

Sounds simple right? However, what I have found is I can either kill the ssh or get a meaningful return code but trying to do both is neigh impossible (at least with my level of scripting.)

The status so far
I currently have a script that will timeout the ssh but can only return whether I killed the ssh or whether it exited of it's own accord.

Code:
###############################################################################
# safe_ssh                                                                    #
#                                                                             #
# Wrapper script for ssh providing a timeout function in the situation that   #
# the ssh hangs.                                                              #
#                                                                             #
# $1    - Timeout duration in seconds.                                        #
# $2+   - ssh parameters.                                                     #
###############################################################################
safe_ssh()
{
  #############################################################################
  # Check that the first parameter is an integer.                             #
  #############################################################################
  if [[ ! -z $(echo $1 | sed 's/[0-9]//g') ]]
  then
    echo "Usage error safe_ssh"
    echo "The first parameter must be an integer representing the timeout "
    echo "duration in seconds."
    exit 1
  fi

  #############################################################################
  # Set up a sleep thread that will run in the background and simply sleep    #
  # for the requisite number of seconds.                                      #
  #############################################################################
  sleep $1 &
  sleep_pid=$!

  #############################################################################
  # Set up the SSH thread to run the command. This will also run in the       #
  # background.                                                               #
  #############################################################################
  shift
  ssh $@ &
  ssh_pid=$!

  #############################################################################
  # Loop until either thread has completed.  We check that the count equals 1 #
  # as the grep will also turn up in the results.  If a thread has completed, #
  # check whether the other is still running and if so terminate it.          #
  #############################################################################
  while :
  do
    ps -p $ssh_pid > /dev/null 2>&1
    if [ 0 = $? ]
    then
      #########################################################################
      # The ssh command is still running, check if sleep has exited and if so #
      # kill the ssh command.                                                 #
      #########################################################################
      ps -p $sleep_pid > /dev/null 2>&1
      if [ 1 = $? ]
      then
        kill -15 $ssh_pid
        exit 1
      fi
    else
      #########################################################################
      # ssh has exited.  If the sleep thread is still running, kill it.       #
      #########################################################################
      ps -p $sleep_pid > /dev/null 2>&1
      if [ 0 = $? ]
      then
        kill -15 $sleep_pid
        exit 0
      fi
    fi
  done

The problem
I can get the return code of the ssh if I echo it into a temporary file from the background process and then read that file in the main process. For example something like:

Code:
(ssh $@; exit_code=$?; echo $exit_code > /tmp/ssh_exit_code) &
ssh_pid=$!

However,
  • In this case, $ssh_pid is no longer the pid of the ssh itself but the whole background script meaning that I can no longer cleanly kill the ssh as I don't know it's pid.
  • I need the file to have a unique file name in case there are multiple instances of the script running so that I can read the correct file from the main script. For this I was thinking of including ssh_pid in the file name.
I thought about echoing the ssh PID into the temp file as well but this will not work as the steps of the script to add data to the tmp file will not be executed till the ssh has completed and of course it won't have completed if it has hung, which is the situation in which we want to kill it.

I hope this vaguely makes sense. Sorry it is a bit convoluted. If you need any clarifications please do ask.

Thanks a lot
Robyn

---------- Post updated at 05:38 PM ---------- Previous update was at 04:41 PM ----------

Okay,

I think I have come up with an idea and it is as follows.

  • set a sleep thread running in the background, (when this sleep thread completes it reads a temporary file for the ssh_pid and kills the ssh.)
  • set up the ssh thread in the background
  • echo the pid of the backgroud ssh to the temporary file
  • wait on the ssh
  • once the wait is complete kill the sleep thread if it exists.
The temporary file will be named with the parent PID so all child processes can determine what it's name is.

This way:
If the ssh finishes without hanging, the wait will provide the background process return code.
If the ssh hangs there will have been plenty of time to write its PID to the temporary file and hence the sleep thread can kill it when it exits.

I think I can do most of this but wanted to run the idea past you in case there is some obvious flaw I haven't spotted.

ALSO: How do I get the PID of the running process, that is say I call my script safe_ssh, how do I get the PID of safe_ssh from within safe_ssh. I assume it must be straight forward but do not currently know.

Thanks a lot
Robyn
# 2  
Old 08-12-2011
Q: ALSO: How do I get the PID of the running process, that is say I call my script safe_ssh, how do I get the PID of safe_ssh from within safe_ssh. I assume it must be straight forward but do not currently know.

A: $$

If I were writing this I wouldn't try to get the return code from ssh. It is quite hard. Why not redirect all the output from ssh to a log file. Then examine the log file for errors, if there are errors, set the return code to non-zero.

---------- Post updated at 12:10 PM ---------- Previous update was at 11:44 AM ----------

Consider these examples:

Code:
safe_ssh 5 wpgux001_sw sleep 20

job should not run for more than 5 seconds
command is: sleep 20

Code:
SSH appears to be hung.
kill -15 934072
Output is:
This is a private computer facility.  Access to the facility must be
specifically authorized.  If you are not authorized, your continued
access and further inquiry expose you to criminal and/or civil
proceedings.

RET_CODE: 255

Code:
safe_ssh 25 wpgux001_sw sleep 20

Output is:
This is a private computer facility.  Access to the facility must be
specifically authorized.  If you are not authorized, your continued
access and further inquiry expose you to criminal and/or civil
proceedings.

RET_CODE: 0

Code:
safe_ssh 25 wpgux00a_sw sleep 20

host wpgux00a_sw does not exist.

Code:
Output is:
ssh: Could not resolve hostname wpgux00a_sw: Hostname and service name not provided or found
RET_CODE: 1

Consider this code:

Code:
safe_ssh () {

  SLEEP_WAIT=$1
  shift

  #############################################################################
  # Check that the first parameter is an integer.                             #
  #############################################################################
  if [[ ! -z $(echo $SLEEP_WAIT | sed 's/[0-9]//g') ]]
  then
    echo "Usage error safe_ssh"
    echo "The first parameter must be an integer representing the timeout "
    echo "duration in seconds."
    exit 1
  fi

  #############################################################################
  # Set up the SSH thread to run the command. This will also run in the       #
  # background.                                                               #
  #############################################################################
  ssh $@ 1>/tmp/ssh.$$ 2>&1 &
  ssh_pid=$!

  # sleep
  sleep $SLEEP_WAIT

  #############################################################################
  # check if ssh is still running, if it is, kill it
  #############################################################################
  if (( $(ps -ef | egrep -v "ps|grep" | grep -cw $ssh_pid) > 0 ))
  then
     echo "SSH appears to be hung."
     echo "kill -15 $ssh_pid"
     RET_CODE=255

  else
     RET_CODE=$(egrep -ci "error|fail|ssh:" /tmp/ssh.$$)

  fi

  echo "Output is:"
  cat /tmp/ssh.$$

}

Notice the change in logic. Sleep is not in the background. I don't kill the sleep. Simply sleep and then check if ssh is still running.
# 3  
Old 08-12-2011
This may be overkill. Why not just use the timeout utility? It seems to do exactly what you ask.

Code:
timeout 300 ssh username@host ...

The 300 is a duration in seconds.
# 4  
Old 08-12-2011
Cool, but I don't see that command on AIX or HP-UX.
# 5  
Old 08-12-2011
Or in the remote .profile or .bashrc use TMOUT=n where n is the number of idle seconds before the process gets killed. You probably should set TMOUT as readonly, which is shell dependent.
# 6  
Old 08-13-2011
It was be belief that the timeout only works if the ssh has properly connected, not got stuck somehow. If I am wrong please do correct me.

@jim mcnamara thanks for the suggestion but this is for a fix that will go out to multiple different systems an I need a fix that will be our code rather than just ours an so I don't think changnig the .profile file is a possibility.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to return from background process and check if it is running or not?

Hi Team, i am executing 3 scripts in background from 1 script and i want to send a message once the script gets completed.these scripts usually takes 1 hr to complete. My sample script is below, Vi abc.sh sh /opt/data/Split_1.sh & sh /opt/data/Split_2.sh & sh /opt/data/Split_3.sh & ... (3 Replies)
Discussion started by: raju2016
3 Replies

2. Shell Programming and Scripting

Pass return value of a function in background process

Hi, I have created a function f1 defined in script A.sh .I have called this function in background . But I want to use its return value for another function f2 in script A.sh. I tried declaring it as a global variable, yet it always returns the status as 0. Is there any way with which I can get... (7 Replies)
Discussion started by: ashima jain
7 Replies

3. Shell Programming and Scripting

Capturing the return code from background process

Hi All, I was out not working on unix from quite sometime and came back recently. I would really appreciate a help on one of the issue I am facing.... I am trying to kick off the CodeNameProcess.sh in PARALLEL for all the available codes. The script runs fine in parallel. Let say there are... (1 Reply)
Discussion started by: rkumar28
1 Replies

4. Shell Programming and Scripting

Catch exit code of specific background process

Hi all, i hava a specific backgroud process. I have de PID of this process. At some time, the process finish his job, is there any way to catch the exit code? I use "echo $?" normally for commands. Thanks! (2 Replies)
Discussion started by: Xedrox
2 Replies

5. Shell Programming and Scripting

[SOLVED] Using "$!" to get the PID of the Last Ran Background Process

Hello All, I was looking into creating a script that would be used only to start a Daemon and create a lock file... F.Y.I. It's for Nagios' NRPE Daemon Plugin... Anyway when I run the command to start the Daemon (below): /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d And... (14 Replies)
Discussion started by: mrm5102
14 Replies

6. Shell Programming and Scripting

does the pid of background process show in /proc?

Hi all, I'm reading <advanced bash scripting> and there is a example to kill a background process in a limited time,as shown below: #! /bin/bash #set -n TIMEOUT=$1 count=0 hanging_jobs & { while ((count < TIMEOUT));do eval ' && ((count = TIMEOUT))' ((count++)) sleep 1... (6 Replies)
Discussion started by: homeboy
6 Replies

7. Shell Programming and Scripting

Return code of background process

Hi, I have a process that I run in the background that looks like this ${BASEDIR}/ksh/sqler.ksh ${compnames003} & and I would like to get the return code of the sqler.ksh script. so my code is like this ${BASEDIR}/ksh/sqler.ksh ${compnames003} & retcode=$? (3 Replies)
Discussion started by: c19h28O2
3 Replies

8. Shell Programming and Scripting

How to include RETURN KEY with Background process "&" in Shell Script

Hello All, I am a newbie in Shell script programming, and maybe you can help me with my query. I need to write a shell script (mntServer.ksh) that will start a background process and also to be able to run another script. The mntServer.ksh script contains: #!/bin/ksh... (1 Reply)
Discussion started by: racbern
1 Replies

9. Shell Programming and Scripting

background process return code

Hi I have the following piece of code that is calling another child process archive.ksh in the background while read file; do file_name=`ls $file`; ksh archive.ksh $file_name &; done < $indirect_file The problem is, indirect_file may contain anwhere from 2 to 20 different... (5 Replies)
Discussion started by: Vikas Sood
5 Replies

10. Shell Programming and Scripting

PID of process started in background??

I am having a problem getting the PID of a process I start in the background is a csh. In tcsh and sh it's simple $! give it to you But in csh this just returns Variable syntax From the man page it should work but it doesn't???? Any help. (2 Replies)
Discussion started by: stilllooking
2 Replies
Login or Register to Ask a Question