Keep up constant number of parallel processes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Keep up constant number of parallel processes
# 1  
Old 02-14-2013
Tools Keep up constant number of parallel processes

Hi guys,
I am struggling with adapting my script to increase the performance.
I created a ksh script to process a lot of files in parallel.
I would like to know how can I do in such a way that a constant number of processes is always up (until all is finished).

What I have is (not actual code):
Code:
maxProc=5
while [[ filesStillExist ]]
do
  for procNo in {1..$maxProc}
  do
    FileProcessor > /dev/null &
  done
  wait
done

This will start 5 processes, wait for all to finish and then start 5 more, etc.
I would like to always keep 5 processes up as it might be the case one job could finish faster than others.

I would like when one process (FileProcessor ) finishes, to start another (and have maximum 5 up)

Thanks in advance

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data.

Last edited by vbe; 02-14-2013 at 10:36 AM.. Reason: code tags, see PM; sorry zaxxon... I scratched you code tags...added Comments
# 2  
Old 02-14-2013
What is your system? What is your shell?

On BASH, you can wait for one specific process, so you can do this:

Code:
maxproc=5
set --
while [[ condition ]]
do
        process &

        set -- $* $!
        [ "$#" -gt maxproc ] && wait "$1" && shift
done

wait

Drawback is it just waits for the oldest process, not necessarily the first finished.
# 3  
Old 02-14-2013
try also:
Code:
max_jobs=5
rm -f .stop_jobs
while [ ! -f .stop_jobs ]
do
   jobs -l | wc -l | read current_job_count
   [[ $current_job_count -lt $max_jobs ]] && { run_process & 2>/dev/null }
   sleep 1
done

to kill script: touch .stop_jobs or Ctrl-C if at prompt.
# 4  
Old 02-14-2013
Thanks for the answers.

Unfortunately, it's not working.
The solution with counting the jobs should be fine but ..I don't know why it keeps starting all the jobs instead of maximum 5.

What I have exactly is:

Code:
#!/bin/ksh

echo "Process started on: (`date`)"
\rm parallel.log > /dev/null


n=`ls -al *.csv | wc -l | cut -d ' ' -f 1` #number of files

i=1
while [[ $i -le $n ]]
do
  for noProc in {1..$1} # $1 is maximum number of processes
  do
        file=`ls -a *.csv | head -$i | tail -1 | xargs basename` #take just the name of the file
        echo "Processing file: " $file " ==> " $i " of " $n # E.g. Processing file: ABC.csv ==> 4 of 154
        ./ProcessFile > /dev/null &
        let i=i+1
        if [ $i -gt $n ]; then
                break # to stop in case total number of files does not divide exactly to the maximum number of processes
        fi
  done
  wait
  echo "=================== Finished batch ==================="
done

echo "Process finished on: (`date`)"

# 5  
Old 02-14-2013
You aren't wait-ing for them when you need to. That's what made my program wait for a process to quit. The 'break' just makes it quit the loop -- which instantly starts over again and creates more. If you want to wait, you must call wait!

You aren't keeping a list of the running processes, either. If you want to wait for one of them, rather than all of them, you need to know which to wait for. That's what the set -- things are for, keeping and adjusting that list. Here's how I'm doing it; set -- changes the $1 $2 ... variables.

Code:
$ set -- a b c
$ echo $@

a b c

$ set -- "$@" d
$ echo $@

a b c d

$ shift
$ echo $@

b c d

$ echo $1

b

$ echo $#

3

$

Are you running head -5 every time to get the fifth line, and such? That's not good... Also, ls *.whatever is completely pointless, * does not need ls's help. There's no point doing a loop from 1 to maxproc either -- loop over the files, use logic to handle maxproc.

Code:
maxproc=5
i=0

# Count files
set -- *.csv
FILES="$#"

# Blank $1 $2 ...
set --


let i=1
for FILE in *.csv
do
        echo "Processing ${FILE/.csv}, $i/$FILES"
        let i=i+1

        ./ProcessFile "$FILE" >/dev/null &

        # Turn $1=pida $2=pidb $3=pidc $4=pidd, into
        # $1=pida $2=pidb $3=pidc $4=pidd $5=pide
        set -- "$@" $!

        # Shift removes $1 and moves the rest down, so you get
        # $1=pidb $2=pidc $3=pidd $3=pide
        # $# is the number of options.
        [ "$#" -ge $maxproc ] && wait $1 && shift
done

# Wait for ALL remaining processes, not just one specific one.
wait


Last edited by Corona688; 02-14-2013 at 12:18 PM..
# 6  
Old 02-14-2013
Hi,

I'm using that head along with tail to get the respective file (for sure there's a better way). E.g.
Code:
file=`ls -a *.csv | head -$i | tail -1 | xargs basename` #take just the name of the file

will give the $i file from the ls list

The first while loop is to go through the number of files but, the increment will not be 1, it will be maxproc.

The break is in order to start only the necessary number of processes at the last iteration (lets say I have 13 files, it will start 5 + 5 + 3)

I tried exactly the code you provided, and it starts 5 parallel processes but, after a short while it will start ALL the rest Smilie (luckily I tested with small number of files)

I'm using "Linux illin135 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux" if it matters.

Thanks for all your help
# 7  
Old 02-14-2013
Quote:
Originally Posted by lurkerro
I tried exactly the code you provided, and it starts 5 parallel processes but, after a short while it will start ALL the rest Smilie (luckily I tested with small number of files)
Hm, I wrote it off the cuff, will check.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

(bash) Script Processes in Parallel

Hello all, I tried to parralise my treatments but after a while 'ps -ef' display all child process <defunct> (zombie) Parent bash script to process all files (>100000) in directory: for filename in /Data/*.txt; do ./child_pprocess.sh $filename & done exit(0)I understand that the... (1 Reply)
Discussion started by: namnetes
1 Replies

2. Shell Programming and Scripting

Monitoring processes in parallel and process log file after process exits

I am writing a script to kick off a process to gather logs on multiple nodes in parallel using "&". These processes create individual log files. Which I would like to filter and convert in CSV format after they are complete. I am facing following issues: 1. Monitor all Processes parallelly.... (5 Replies)
Discussion started by: shunya
5 Replies

3. Shell Programming and Scripting

Deleting all the parallel processes launched when the main script receives a ctrl+c

Hi, I have a shell script that creates 2 parallel processes. When I press ctrl+c, i want the parallel process to get killed as well. #!/bin/bash cmd1="script1.py" cmd2="script2.py" ${cmd1} & pid1=$! echo ${pid1} ${cmd2} & pid2=$! (7 Replies)
Discussion started by: sana.usha
7 Replies

4. Shell Programming and Scripting

Parallel processes to INC- and DEC-rement shared counter

QUESTION: How do I run processes in parallel, so that the counter (in counter.txt) would vary in value (instead of just "0" and "1")? That is, how to not sequentially run inc.sh and dec.sh? The shared counter (a single number starting as 0) is in a file counter.txt. counter.sh is (supposed to... (2 Replies)
Discussion started by: courteous
2 Replies

5. Shell Programming and Scripting

Retention of Variable Value when a script is called by different processes in parallel- Linux 2.6.9

Hi, I have a generic FTP script which will be called by 28 different processes in parallel (through a GUI tool) may or may not be at the exact moment (there could be a delay of about a minute or so). ./FTP.ksh 1 (1 through 28) This script after importing file from remote m/c... (1 Reply)
Discussion started by: dips_ag
1 Replies

6. Shell Programming and Scripting

Trim not constant number of symbols ?

Hello, I need to trim zeros from left side: #echo $var1 00023456 But number of zeros is not constant. How do I do that ? thanks Vilius (4 Replies)
Discussion started by: vilius
4 Replies

7. Shell Programming and Scripting

How to run processes in parallel?

In a korn shell script, how can I run several processes in parallel at the same time? For example, I have 3 processes say p1, p2, p3 if I call them as p1.ksh p2.ksh p3.ksh they will run after one process finishes. But I want to run them in parallel and want to display "Process p1... (3 Replies)
Discussion started by: sbasak
3 Replies
Login or Register to Ask a Question