Start process on X number of files and then wait for the next batch


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Start process on X number of files and then wait for the next batch
# 1  
Old 04-29-2019
Start process on X number of files and then wait for the next batch

Thanks for RudiC for his extraordinary help on organizing files in a batch of 10 using below code.

Code:
FL=($(ls)); 
for ((i=0;i<=${#FL[@]};i++)); do 
for j in ${FL[@]:$i:10};
do 
$batch  ${j}  ${j}.txt
done; 
echo "Pausing for next iteration";
echo "------------------------------------------------------------------------------------"; 
((i+=10)); 
done

Problem:

I want to run the batch on 10 files ( which takes a lot of time ) and then once the batch is finished , I want to execute the next iteration

if I do something like this,


Code:
for j in ${FL[@]:$i:10};
do 
$batch -in  ${j} -out  ${j}.txt &
done;

the problem with this approach is that every batch process is sent to background and I can't find if there are any processes running in the background for the current session so that I can start the next session.


Alternatively, if I run like this, which is again a sequential approach and doesn't fulfill the requirement of parallel run.

Code:
$batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt

Regards,
NMR
# 2  
Old 04-29-2019
Use the wait command to wait for all background processes to complete.
# 3  
Old 04-29-2019
I suppose that there is a question about how do you want tit to run. Is it that you want them all in straight away but with no more that 10 running (presumably to manage the load/contention) or do you want to run in blocks of 10 where all 10 have to finish before you start the next block?

For instance if you have a job that just went to sleep for the number of seconds of it's job number, (1, 2, 3, 4, .... etc.) Would you want logic A or B?

Logic A
Code:
00:00:00 Start job 1
00:00:00 Start job 2
00:00:00 Start job 3
00:00:00 Start job 4
00:00:00 Start job 5
00:00:00 Start job 6
00:00:00 Start job 7
00:00:00 Start job 8
00:00:00 Start job 9
00:00:00 Start job 10
00:00:01 Ended job 1
00:00:02 Ended job 2
00:00:03 Ended job 3
00:00:04 Ended job 4
00:00:05 Ended job 5
00:00:06 Ended job 6
00:00:07 Ended job 7
00:00:08 Ended job 8
00:00:09 Ended job 9
00:00:10 Ended job 10
00:00:10 Started job 11
00:00:10 Started job 12
00:00:10 Started job 13
:
:
etc.

Logic B
Code:
00:00:00 Start job 1
00:00:00 Start job 2
00:00:00 Start job 3
00:00:00 Start job 4
00:00:00 Start job 5
00:00:00 Start job 6
00:00:00 Start job 7
00:00:00 Start job 8
00:00:00 Start job 9
00:00:00 Start job 10
00:00:01 Ended job 1
00:00:01 Started job 11
00:00:02 Ended job 2
00:00:02 Started job 12
00:00:03 Ended job 3
00:00:03 Started job 13
00:00:04 Ended job 4
00:00:04 Started job 14
:
:
etc.

Logic A will work if you are sure you have no other background processes and 'simply':-
Code:
for single_job in ${job_list}
do
   "${single_job}" &
   ((active_count=$active_count+1))
   if [ $active_count -ge 10 ]
   then
      wait                    # All background processes must end before we start the next block
      active_count=0  # Reset the counter
   fi
done

echo "Finished them in blocks or up to ten"

If you need logic B then you need to keep a track of them more carefully. You can use clever data holding processes to keep track of a specific process id, or if they will be unique enough, just count the running processes:-
Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &
   fi
done

echo "Finished them with up-to-ten running"

The important bit
You need to consider
  1. what you are searching for. Get it wrong and you might not submit anything or you might submit everything in one go.
  2. how to make sure you don't find the grep you are searching with as a process itself else you are using up one of your planned ten background jobs
For consideration a, my example I just use a sleep, so you can see it in what I've coded. You need to be sure that you find the right things and that your 'single jobs' don't actually spawn sub-processes of the same name else you will count them twice. You could look for sub-processes of your running script if that is important to you.
For consideration b, I have set the search with an expression by wrapping a character in [ & ] This means that the grep will expand this to any string matching the expression (for which there is only one option, of course) but crucially it will ignore itself, because the expression doesn't match the square bracket itself.



Do either of these approaches help?

Kind regards,
Robin
# 4  
Old 04-30-2019
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.

I think I'll have to go with the Plan-B ( execute 10 jobs at a given time , and start a new if there are less than 10 jobs running ) --

but what if 2 or more background processes are finished before the sleep is timed out, You see your code is running only 1 job in the else case.


Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &                 <----- Should I do the calculation on this point as well , how many jobs are running and how many to start ?
   fi
done

echo "Finished them with up-to-ten running"

# 5  
Old 04-30-2019
Quote:
Originally Posted by busyboy
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.
There is a command which does exactly what you want with one word. I suggested it before. To repeat:
Quote:
Originally Posted by Corona688
Use the wait command to wait for all background processes to complete.
Use the wait command to wait for all background processes to complete.
# 6  
Old 04-30-2019
Quote:
Originally Posted by busyboy
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.

I think I'll have to go with the Plan-B ( execute 10 jobs at a given time , and start a new if there are less than 10 jobs running ) --
Well, that's is just totally confusing. How would you propose to start 10 in the foreground? There is a way to put a process into the background that opens a new terminal session and it then runs the command you want in the foreground there, but I doubt that is what is being requested. In any case, you have almost certainly chosen the opposite of the brief if you go for logic b.

You want logic a and the advice given by Corona688



Quote:
Originally Posted by busyboy
but what if 2 or more background processes are finished before the sleep is timed out, You see your code is running only 1 job in the else case.


Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &                 <----- Should I do the calculation on this point as well , how many jobs are running and how many to start ?
   fi
done

echo "Finished them with up-to-ten running"

Okay, perhaps I should have added wait before the echo "Finished them with up-to-ten running" to ensure they all finished. The loop for logic b will run and keep adding jobs up to the limit until there are none more left to run at which point the loop will end, but the jobs keep running to completion. There will always be a position where there are none left & fewer than 10 running.


Kind regards,
Robin
# 7  
Old 05-12-2019
You could have GNU parallel deal with this for you, as long as it's available on you target system.

For example:
Code:
CMD=(J L M)
batch=echo
for j in ${CMD[@]}
do
   echo $batch -in  ${j} -out  ${j}.txt
done | parallel -j 10 --load 80% --noswap '{}'

The above will continue to start jobs (up to 10 at once) as long as total CPU load is below 80% and no swap in/out activity.
This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Windows & DOS: Issues & Discussions

How to start a vbs from a windows batch file?

Morning, I'm trying to execute a vbs from a .bat file. Can someone tell me what the difference is between these statements: start c:\lib\runit.vbc c:\lib\runit.vbs When I run the batch with the 'start' parameter it doesn't seem to do anything. (1 Reply)
Discussion started by: Grueben
1 Replies

2. Shell Programming and Scripting

Executing a batch of files within a shell script with option to refire the individual files in batch

Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below. I am on a solaris server btw. Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose. ... (2 Replies)
Discussion started by: goddevil
2 Replies

3. Shell Programming and Scripting

Using Make to batch process files

Hello all, I have a make question, and I was hoping somebody here might be able to point me in the right direction. Here is my issue; I have a command-line tool that I use to run a conversion on an input XML file that results in an output binary file. However, this particular tool needs to... (1 Reply)
Discussion started by: jujumbura
1 Replies

4. Shell Programming and Scripting

How to make the parent process to wait for the child process

Hi All, I have two ksh script. 1st script calls the 2nd script and the second script calls an 'C' program. I want 1st script to wait until the 'C' program completes. I cant able to get the process id for the 'C' program (child process) to make the 1st script to wait for the second... (7 Replies)
Discussion started by: sennidurai
7 Replies

5. Solaris

Number of files - in start of year

Is there any way to find "Number of files" that exists on my solaris parition in the starting of 2009 year ? I know ctime or mtime will not help and unix wouldnt store creation time. Only hope i can see ( and i am not sure if that will help ) is that my system is up from last 2 years without... (5 Replies)
Discussion started by: rajwinder
5 Replies

6. Shell Programming and Scripting

Ksh Script to get the start minute of n number of process

How can i write a script.? which lists all X process and gets the start minute of each of them. thanks (1 Reply)
Discussion started by: Anteus
1 Replies

7. UNIX for Advanced & Expert Users

wait process

can any one please give me clear idea of wait process in UNIX system. I am using AIX 5.3 and see loots of wait process. I have very basic concept of wait process. If CPU has nothing to do then a wait process is generated per CPU. But i want know the detail how is it forked. Is wait a jombe... (2 Replies)
Discussion started by: pchangba1
2 Replies

8. Shell Programming and Scripting

wait command - cat it wait for not-chile process?

Did not use 'wait' yet. How I understand by now the wait works only for child processes, started background. Is there any other way to watch completion of any, not related process (at least, a process, owned by the same user?) I need to start a background process, witch will be waiting... (2 Replies)
Discussion started by: alex_5161
2 Replies

9. UNIX for Dummies Questions & Answers

Append value(batch number) to start of records

Hi all, I am new to unix shell scripting and I am trying to append batch number that comes in Trailer record to the detailed record. TR|20080312|22881 |000000005|20080319|2202 LN|20080312|077777722220 |0000100000017|ABS LN|20080312|000799439326 |0000709943937|AA TR|20080313|22897 ... (6 Replies)
Discussion started by: kiran_418
6 Replies

10. UNIX for Dummies Questions & Answers

Process Wait on DG UX

Does anyone know what the equivalent command to pwait on Solaris is on DG/UX. I need my script to kick off a process and wait till it is complete before continuing with the script. (4 Replies)
Discussion started by: fabbas
4 Replies
Login or Register to Ask a Question