Help needed on restart-from-point-of-failure in Parallel Processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help needed on restart-from-point-of-failure in Parallel Processing
# 1  
Old 04-06-2017
Help needed on restart-from-point-of-failure in Parallel Processing

Hi Gurus,
Good morning... Smilie
OS Info:
Linux 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

I have a script which takes multiples parameters from a properties file one by one and run in background (to do parallel processing). As example:
Code:
$ cat properties.file
Account
Customer
Address
Phone

part of customCombinedScripts.sh
Code:
 
  
 while IFS= read -r fileLines; do
    /path/to/product/script/data_process "${fileLines}" || exit -1 &
    dpPIDs+=" $!"
 done < properties.file
 ## logic to check background job status (success or failed)
for chkPIDs in $dpPIDs; do
    if ! wait $chkPIDs; then
        failCnt=`expr $failCnt + 1`
    fi
done
 if [ $failCnt -gt 0 ]; then
    echo "[`date`][FATAL] There are errors in Data Processing, total number of DP jobs failure is $failCnt. Check log."
    exit -1
else
    echo "[`date`][SUCCESS] Data Processing for received files has been completed..."
fi

Now my requirement is: Suppose "Address" failed, how to restart the script which will take the failed parameters (i.e Address) only?

I have implemented restart-from-point-of-failure concept in my another script which has "Sequential processing" using help from this article, posted by Corona688 Resume from last failed command

How to implement the same concept in "Parallel Processing"?

Kindly help / provide ideas.

Cheers,
Saptarshi
# 2  
Old 04-06-2017
As a quick first pass, I'd suggest that your line that ends with || exit -1 & will actually put an exit -1 into the background if the data_process call fails (returns non-zero) but doesn't cause the overall call to data_process code to run in the background.

Have I got that wrong?

You might want this:-
Code:
(/path/to/product/script/data_process "${fileLines}" || exit -1) &

It might be simpler to have a directory called 'Running' that you create a marker file in when you start a job and only remove when you successfully exit. That way, you could write something to read the directory contents for a restart.

Would that logic help?



Robin
# 3  
Old 04-07-2017
Thank you rbattle1,
Quote:
You might want this:-

Code:
(/path/to/product/script/data_process "${fileLines}" || exit -1) &
make sense and corrected the script(though earlier it was running fine (Smilie), don't know how).

Currently I'm trying to get the command I'm running using below sample test script:
Code:
 $ [] cat bgPIDTest.ksh
# Some function that takes a long time to process
longprocess() {
        # Sleep up to 14 seconds
        #sleep $((RANDOM % 15))
        sleepTime=$((RANDOM % 5))
        # Randomly exit with 0 or 1
        exitCode=$((RANDOM % 2))
        echo "sleeping for: $sleepTime with exit code: $exitCode "
        sleep $sleepTime
        exit $exitCode
}
 pids=""
failCnt=0
# Run five concurrent processes
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
        echo PID $pids
 
# Wait for all processes to finish, will take max 14s
echo "initial failCnt is $failCnt"
for p in $pids; do
        #if wait $p; then
        if ! wait $p; then
            cmdJobNM=`ps -p $p -o command=`
            failCnt=`expr $failCnt + 1`
            echo "failed command is --> $cmdJobNM, PID: $p"
        fi
done
echo "total failCnt is $failCnt"
if [ $failCnt -gt 0 ]; then
    exit -1
fi

Output:
Code:
 $ [] ksh bgPIDTest.ksh
PID 15858
sleeping for: 2 with exit code: 1
PID 15858 15859
sleeping for: 4 with exit code: 0
PID 15858 15859 15860
sleeping for: 0 with exit code: 0
PID 15858 15859 15860 15861
sleeping for: 3 with exit code: 0
PID 15858 15859 15860 15861 15862
sleeping for: 3 with exit code: 1
PID 15858 15859 15860 15861 15862 15863
initial failCnt is 0
sleeping for: 1 with exit code: 0
failed command is --> , PID: 15858
failed command is --> , PID: 15862
total failCnt is 2

Still I'm not getting the command, so that I can awk'ed the passed argument and put into a file. So the logic will be:
Once I'll restart script:
if this new file exist
take this new file
else
use old config file

once successfully done, I'll remove the error file (if any). Please suggest if this feasible.

Cheers,
Saps.
# 4  
Old 04-10-2017
The ps utility only returns information about currently active processes; not those that have exited and been reaped.

If you would tell us what shell an what version of that shell you're using, we could make suggestions about ways to store information about the commands you are running in the background along with the PIDs of those commands.
# 5  
Old 04-12-2017
Hi Don,

Please find the below info as requested:

Sh version:
Code:
$ sh --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

Ksh version:
Code:
 $ ksh --version
  version         sh (AT&T Research) 93u+ 2012-08-01

I gave both of them as my main script is ksh and all product scripts are sh.

Please let me know if you need any further information.

Cheers,
Saps.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parallel processing

I have 10,000 + files, each of which I need to zip using bzip2. Is ti possible to use bash to create 8 parallel streams sending a new file to be processed from the list when one of the others has finished? (1 Reply)
Discussion started by: garethsays
1 Replies

2. Shell Programming and Scripting

Re run the script from the point of failure

Hello I have a shell script with multiple sections in it. Like, verify pre-requisites, ch co version, stop services , install product , post migration steps, start services, send status email. If the script fails at certain step, (like after product installation) it can't be re-run since the... (2 Replies)
Discussion started by: mo12
2 Replies

3. Shell Programming and Scripting

script parallel processing

How to write script which run multiple scripts parllely, i have script called A,which has to execute B,C,D,E scripts parllely.. (2 Replies)
Discussion started by: machpee
2 Replies

4. Shell Programming and Scripting

How to make parallel processing rather than serial processing ??

Hello everybody, I have a little problem with one of my program. I made a plugin for collectd (a stats collector for my servers) but I have a problem to make it run in parallel. My program gathers stats from logs, so it needs to run in background waiting for any new lines added in the log... (0 Replies)
Discussion started by: Samb95
0 Replies

5. Shell Programming and Scripting

parallel processing

hi i am preparing a set of batches for a set of files sequentially There is a folder /xyz where all the files reside now all the files starting with 01 - will be appended for one below other to form a batch batch01 then all the files starting with 02 - will be appended for one below other to... (7 Replies)
Discussion started by: mad_man12
7 Replies

6. Shell Programming and Scripting

Need Help With Parallel Processing

Hi I am looking for some kind of feature in unix that will help me write a script that can invoke multiple processes in parallel. And make sure that the multiple parallel processes complete successfully before I proceed to the next step. Someone suggested something called timespid or... (6 Replies)
Discussion started by: imnewtothis23
6 Replies

7. Shell Programming and Scripting

parallel processing

Hi I want to run two shell script files parallely. These two scripts are interacting with the database. can any body help on this Pls Regards Audippa naidu.M (3 Replies)
Discussion started by: audippa
3 Replies

8. UNIX for Dummies Questions & Answers

Restart on power failure

How do I configure my workstation (Solaris 9) to restart and perform any check disk automatically if there is a power failure? Thanks. (1 Reply)
Discussion started by: here2learn
1 Replies

9. UNIX for Dummies Questions & Answers

How to do parallel processing??

Hi All, I am working on solaris 8 sparc machine with 2 cpu. I am trying to run my application which generates files. I run multiple instance of the application, but the results don't seem to show as if it were runing parallely. When i run the application once it takes 12 secs to generate a... (1 Reply)
Discussion started by: zing
1 Replies
Login or Register to Ask a Question