Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Start process on X number of files and then wait for the next batch


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Start process on X number of files and then wait for the next batch

Thanks for RudiC for his extraordinary help on organizing files in a batch of 10 using below code.

Code:
FL=($(ls)); 
for ((i=0;i<=${#FL[@]};i++)); do 
for j in ${FL[@]:$i:10};
do 
$batch  ${j}  ${j}.txt
done; 
echo "Pausing for next iteration";
echo "------------------------------------------------------------------------------------"; 
((i+=10)); 
done

Problem:

I want to run the batch on 10 files ( which takes a lot of time ) and then once the batch is finished , I want to execute the next iteration

if I do something like this,


Code:
for j in ${FL[@]:$i:10};
do 
$batch -in  ${j} -out  ${j}.txt &
done;

the problem with this approach is that every batch process is sent to background and I can't find if there are any processes running in the background for the current session so that I can start the next session.


Alternatively, if I run like this, which is again a sequential approach and doesn't fulfill the requirement of parallel run.

Code:
$batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt && $batch -in  ${j} -out  ${j}.txt

Regards,
NMR
# 3  
I suppose that there is a question about how do you want tit to run. Is it that you want them all in straight away but with no more that 10 running (presumably to manage the load/contention) or do you want to run in blocks of 10 where all 10 have to finish before you start the next block?

For instance if you have a job that just went to sleep for the number of seconds of it's job number, (1, 2, 3, 4, .... etc.) Would you want logic A or B?

Logic A
Code:
00:00:00 Start job 1
00:00:00 Start job 2
00:00:00 Start job 3
00:00:00 Start job 4
00:00:00 Start job 5
00:00:00 Start job 6
00:00:00 Start job 7
00:00:00 Start job 8
00:00:00 Start job 9
00:00:00 Start job 10
00:00:01 Ended job 1
00:00:02 Ended job 2
00:00:03 Ended job 3
00:00:04 Ended job 4
00:00:05 Ended job 5
00:00:06 Ended job 6
00:00:07 Ended job 7
00:00:08 Ended job 8
00:00:09 Ended job 9
00:00:10 Ended job 10
00:00:10 Started job 11
00:00:10 Started job 12
00:00:10 Started job 13
:
:
etc.

Logic B
Code:
00:00:00 Start job 1
00:00:00 Start job 2
00:00:00 Start job 3
00:00:00 Start job 4
00:00:00 Start job 5
00:00:00 Start job 6
00:00:00 Start job 7
00:00:00 Start job 8
00:00:00 Start job 9
00:00:00 Start job 10
00:00:01 Ended job 1
00:00:01 Started job 11
00:00:02 Ended job 2
00:00:02 Started job 12
00:00:03 Ended job 3
00:00:03 Started job 13
00:00:04 Ended job 4
00:00:04 Started job 14
:
:
etc.

Logic A will work if you are sure you have no other background processes and 'simply':-
Code:
for single_job in ${job_list}
do
   "${single_job}" &
   ((active_count=$active_count+1))
   if [ $active_count -ge 10 ]
   then
      wait                    # All background processes must end before we start the next block
      active_count=0  # Reset the counter
   fi
done

echo "Finished them in blocks or up to ten"

If you need logic B then you need to keep a track of them more carefully. You can use clever data holding processes to keep track of a specific process id, or if they will be unique enough, just count the running processes:-
Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &
   fi
done

echo "Finished them with up-to-ten running"

The important bit
You need to consider
  1. what you are searching for. Get it wrong and you might not submit anything or you might submit everything in one go.
  2. how to make sure you don't find the grep you are searching with as a process itself else you are using up one of your planned ten background jobs
For consideration a, my example I just use a sleep, so you can see it in what I've coded. You need to be sure that you find the right things and that your 'single jobs' don't actually spawn sub-processes of the same name else you will count them twice. You could look for sub-processes of your running script if that is important to you.
For consideration b, I have set the search with an expression by wrapping a character in [ & ] This means that the grep will expand this to any string matching the expression (for which there is only one option, of course) but crucially it will ignore itself, because the expression doesn't match the square bracket itself.



Do either of these approaches help?

Kind regards,
Robin
# 4  
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.

I think I'll have to go with the Plan-B ( execute 10 jobs at a given time , and start a new if there are less than 10 jobs running ) --

but what if 2 or more background processes are finished before the sleep is timed out, You see your code is running only 1 job in the else case.


Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &                 <----- Should I do the calculation on this point as well , how many jobs are running and how many to start ?
   fi
done

echo "Finished them with up-to-ten running"

# 5  
Quote:
Originally Posted by busyboy
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.
There is a command which does exactly what you want with one word. I suggested it before. To repeat:
Quote:
Originally Posted by Corona688
Use the wait command to wait for all background processes to complete.
Use the wait command to wait for all background processes to complete.
# 6  
Quote:
Originally Posted by busyboy
the exact narration is

execute 10 jobs in either background or foreground -- wait for all them to finish and then start the next 10 or whatever count is coming from the next batch of files.

I think I'll have to go with the Plan-B ( execute 10 jobs at a given time , and start a new if there are less than 10 jobs running ) --
Well, that's is just totally confusing. How would you propose to start 10 in the foreground? There is a way to put a process into the background that opens a new terminal session and it then runs the command you want in the foreground there, but I doubt that is what is being requested. In any case, you have almost certainly chosen the opposite of the brief if you go for logic b.

You want logic a and the advice given by Corona688



Quote:
Originally Posted by busyboy
but what if 2 or more background processes are finished before the sleep is timed out, You see your code is running only 1 job in the else case.


Code:
for single_job in ${job_list}
do
   active_jobs=$(ps -f|grep -c slee[p])            #  This is the important bit!
   if [ $active_count -ge 10 ]
   then
      echo "All running at full capacity"
      sleep 1    # Pick a suitable check frequency
   else
      "${single_job}" &                 <----- Should I do the calculation on this point as well , how many jobs are running and how many to start ?
   fi
done

echo "Finished them with up-to-ten running"

Okay, perhaps I should have added wait before the echo "Finished them with up-to-ten running" to ensure they all finished. The loop for logic b will run and keep adding jobs up to the limit until there are none more left to run at which point the loop will end, but the jobs keep running to completion. There will always be a position where there are none left & fewer than 10 running.


Kind regards,
Robin
# 7  
You could have GNU parallel deal with this for you, as long as it's available on you target system.

For example:
Code:
CMD=(J L M)
batch=echo
for j in ${CMD[@]}
do
   echo $batch -in  ${j} -out  ${j}.txt
done | parallel -j 10 --load 80% --noswap '{}'

The above will continue to start jobs (up to 10 at once) as long as total CPU load is below 80% and no swap in/out activity.
This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Executing a batch of files within a shell script with option to refire the individual files in batch
goddevil
Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below. I am on a solaris server btw. Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose. ...... Shell Programming and Scripting
2
Shell Programming and Scripting
Using Make to batch process files
jujumbura
Hello all, I have a make question, and I was hoping somebody here might be able to point me in the right direction. Here is my issue; I have a command-line tool that I use to run a conversion on an input XML file that results in an output binary file. However, this particular tool needs to...... Shell Programming and Scripting
1
Shell Programming and Scripting
Ksh Script to get the start minute of n number of process
Anteus
How can i write a script.? which lists all X process and gets the start minute of each of them. thanks... Shell Programming and Scripting
1
Shell Programming and Scripting
wait command - cat it wait for not-chile process?
alex_5161
Did not use 'wait' yet. How I understand by now the wait works only for child processes, started background. Is there any other way to watch completion of any, not related process (at least, a process, owned by the same user?) I need to start a background process, witch will be waiting...... Shell Programming and Scripting
2
Shell Programming and Scripting
Append value(batch number) to start of records
kiran_418
Hi all, I am new to unix shell scripting and I am trying to append batch number that comes in Trailer record to the detailed record. TR|20080312|22881 |000000005|20080319|2202 LN|20080312|077777722220 |0000100000017|ABS LN|20080312|000799439326 |0000709943937|AA TR|20080313|22897 ...... UNIX for Dummies Questions & Answers
6
UNIX for Dummies Questions & Answers

Featured Tech Videos