Restarting jobs after failures


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Restarting jobs after failures
# 1  
Old 05-08-2008
Restarting jobs after failures

Hi,

I have a script which has a for loop in it. In that for loop are names of files that are going to be processed. For example,

for file in file1 file2 file3 file4
do
(something)
done

Let's say that this script gets executed and it fails at 'file2'. Is there a way that I can actually tell this script where to restart from (file3), to avoid going into this script and modifying it? Any suggestions would be appreciated.

As always,thanks!
# 2  
Old 05-08-2008
Creating a step file

create step file before processing the file. ie.. the file
run.step should contain the file which it processed.

when you re-run check the last run file and continue from there.
# 3  
Old 05-12-2008
Thanks for replying.

Can you please provide an example so that I understand what you mean?

Thanks in advance!
# 4  
Old 05-12-2008
I think abcd122 just meant to keep track of which files you have finished, so you have a way of knowing which files still need processing. That's a start, but doesn't really solve the problem, as such.

I usually run GNU make for stuff that might take long and might bomb out; it will delete the output form a botched run (if you specify .DELETE_ON_ERROR) and can keep track of what's been done and what still needs to be done. (Still missing from the picture is some sort of concurrency control, to be able to see that a particular pending result is already running on another host.) It takes a bit of getting used to but it has paid itself back handsomely a number of times.

That's a more radical refactoring than you were thinking of, I'm sure, but I'm offering it nevertheless; if interruptions are a scenario you need to take seriously, there's no way you can code a for loop which simply "knows" when it's done.

But of course, if the presence of a file is enough, then the really simple thing will work:

Code:
for f in file1 file2 file3 file4; do
  if [ -e $f.out ]
  then
    echo $f.out already exists -- not rerunning
  else if  long and winding and painstaking command to calculate ruptures in space time fabric <$f>$f.tmp
   then
    # only commit when it really finished, notice this is also conditional on exit code
    mv $f.tmp $f.out
  else
    echo "oh dear, $f failed (exit code $?), leaving output in $f.tmp" >&2
  fi
done

# 5  
Old 05-13-2008
Thank you for your post. I can tell you are really experienced with this stuff. Going back and reading my initial post I realized that I did not do a good job at explaining what my goal is. At least, I don't think so. So, let me explain myself in a better fashion.

I have three .ctl files:

A.ctl
B.ctl
C.ctl

In all of these .ctl files, I have, among other things, something called "BATCH". It looks like this: BATCH = 123-133. For each of these files I am grepping the "BATCH" value. Which is where my for loop comes to play.

grep 'BATCH" /somewhere/in/a/dir/${filename}.ctl | read a1 a2 BATCH

for filename in A B C
do
(executing another script which will use the "BATCH" to load some data)
done

This script will be running via kron scheduler. So let's say that for the A.ctl, everything went smoothly, but when it came to the B.ctl we got an error. We would like to re-run B.ctl, without going into the script and deleting A.ctl from the script and letting the for loop continue with B and C.

Is there a way where I can, from the command line pass the $filename parameter to this script so it can start the for loop from where I want it to?

THANKS!
# 6  
Old 05-13-2008
Frankly, I think you have all the pieces you need, you just need to do it.
# 7  
Old 05-13-2008
Smilie

It is easier said than done, my friend.

I just need to gather my thoughts. I definately have great examples to follow.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to run multiple jobs and it's dependent jobs

I have multiple jobs and each job dependent on other job. Each Job generates a log and If job completed successfully log file end's with JOB ENDED SUCCESSFULLY message and if it failed then it will end with JOB ENDED with FAILURE. I need an help how to start. Attaching the JOB dependency... (3 Replies)
Discussion started by: santoshkumarkal
3 Replies

2. Solaris

11.0 to 11.2 update failures

Attempting to update an 11.0 server with many non-global zones installed. pkg publisher is pkg.oracle.com/solaris/support. FMRI = pkg://solaris/entire@0.5.11,5.11-0.175.1.15.0.4.0:20131230T203500Z When we run pkg update --accept the server contacts oracle, checks packages, finds about 700... (4 Replies)
Discussion started by: CptCarrot
4 Replies

3. Shell Programming and Scripting

waiting on jobs in bash, allowing limited parallel jobs at one time, and then for all to finish

Hello, I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I want to allow, and then wait... (1 Reply)
Discussion started by: srao
1 Replies

4. Shell Programming and Scripting

Capture linking failures

Hi all, I have a script file that has numerous linking commands (ln -s) and currently there is no checking to see if the linking is successful or not and I need to implement something that checks if any of the linking failed and report a failure. The method I can think of is a small function... (3 Replies)
Discussion started by: zmfcat1
3 Replies

5. Solaris

Solaris 10 svcs failures

upon rebooting the solaris 10 system, all the services went offilne or uninitialised. If I break the SVM mirror and reboot the system with the raw device, all services are up. Once I recreate a fresh mirror(metadevices) and reboot, it goes offline again. Needed to do svcadm clear <service> to bring... (16 Replies)
Discussion started by: incredible
16 Replies

6. Solaris

Prediction of failures

Any diagnostic tool to do predictive check on all the SUN hard disks before it fails, as a preventive measure? Meaning, is there any tool that can really check for hdd which are failing/or "will fail soon" for Sun servers? (12 Replies)
Discussion started by: incredible
12 Replies

7. Shell Programming and Scripting

Display login failures

How to display failled login in a file. i.e when there occurs a login failure,the login failed date and time should be printed in that file.. (0 Replies)
Discussion started by: aravind007
0 Replies

8. HP-UX

Communication Failures

HI ALL, I have been trying to install a particular software using remote linux server. some thing like this: rsh <host ID> /usr/sbin/swinstall -x autoreboot=true -s /tmp/<software> <Product name>. The problem is whenever I try to install the product through a shell script the installation... (1 Reply)
Discussion started by: barun agarwal
1 Replies

9. Shell Programming and Scripting

background jobs exit status and limit the number of jobs to run

i need to execute 5 jobs at a time in background and need to get the exit status of all the jobs i wrote small script below , i'm not sure this is right way to do it.any ideas please help. $cat run_job.ksh #!/usr/bin/ksh #################################### typeset -u SCHEMA_NAME=$1 ... (1 Reply)
Discussion started by: GrepMe
1 Replies

10. Solaris

Core dump failures

Does anyone have a list of error codes when core dumps fail? What is error 4? I also have another box that does error-2 occasionally. if anyone has a list of these error codes, it would be appreciated, thanks! I have the error below: NOTICE: core_log: ns-admin core dump failed,... (2 Replies)
Discussion started by: BG_JrAdmin
2 Replies
Login or Register to Ask a Question