Parallel bash scripts


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parallel bash scripts
# 1  
Old 08-22-2016
Parallel bash scripts

Need some help to replace bash script with parallel to speed up job on multiple files (400files.list is the file contains the absolute path to those 400 files). The bash script is to run the same program over the files repetitively.
My bash_script.sh is:
Code:
for sample in `cat 400files.list`; do
sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
>  ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err

done

Parallel fits into this job. From gnu.org parallel manual I read:
Quote:
Assuming that file contains a list of shell commands, one per line,
Code:
parallel -j 10 < file

will evaluate the commands using the shell (since no explicit command is supplied as an argument), in blocks of ten shell jobs at a time.
which is related to my previous post. I was thinking something like
Code:
parallel -j 24 my_bash_script_{}.sh ::: 400files.list

but I have two issues here:
  1. there will be ~400 .sh files, which seems not correct obviously;
  2. my script contains multiple lines of a single job with other variables embedded such as I_DIR, O_DIR, GENOME_DIR and LOG_DIR.
I'm quite lost in my mind how to replace the bash script with parallel to speed up the job.
Thanks in advance!

---------- Post updated at 06:32 PM ---------- Previous update was at 03:48 PM ----------

Just tried one way myself, but not sure it can be optimized:
Code:
i=$1; 
sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
>  ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err

Then change the permission of the script to be executable. Run as:
Code:
cat 400files.list | parallel -j 24 ./my_script.sh

Thanks for any suggestion on parallel and bash script.

Last edited by rbatte1; 08-25-2016 at 08:43 AM.. Reason: Converted to formatted number-list
# 2  
Old 08-22-2016
First, get rid of cat:
Code:
parallel -j 24 ./my_script.sh < 400files.list

Then, if every line in 400files.list contains a "/" character, speed up:
Code:
sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

considerably by changing the command expansions invoking basename and dirname to variable expansions of the form:
Code:
sample=${i##*/}
I_DIR=${i%/*}

and get rid of the duplicate computations:
Code:
O_DIR=$I_DIR

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 08-22-2016
I am glad I did not make too much mistake here!
Thanks Don for:
  1. I was not sure redirection "<" for parallel while reading the manual;
  2. changing the command expansions to variable expansions plays lots tricks here.
  3. O_DIR=$I_DIR is a small bug, as they can be different so I keep it at this moment.

Last edited by rbatte1; 08-25-2016 at 08:43 AM.. Reason: Converted to formatted number-list
# 4  
Old 08-23-2016
< works for anything which reads from stdin, the only "magic" things which don't are password prompts on terminal logins / sudo / ssh and the like, which reject non-terminals on purpose.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

(bash) Script Processes in Parallel

Hello all, I tried to parralise my treatments but after a while 'ps -ef' display all child process <defunct> (zombie) Parent bash script to process all files (>100000) in directory: for filename in /Data/*.txt; do ./child_pprocess.sh $filename & done exit(0)I understand that the... (1 Reply)
Discussion started by: namnetes
1 Replies

2. Shell Programming and Scripting

Parallel processing in AIX (bash shell scripting)

Hi all, I am struggling to build a utility which can do the parallel processing. I achieved same in Linux using xargs -P but same is not working on AIX. I am building file copy utility where I will have all required info in a file (like souce file info and target location details), now i need... (2 Replies)
Discussion started by: ankur singh
2 Replies

3. Shell Programming and Scripting

[Solved] Running scripts in parallel

i have script A and script B, both scripts have to run in parallel, my requirement is script A create table temp1, post creating it will run fr 4 hrs , script B has to start 0nly after creation of table temp1 ( which is done by script A) , again script B will run for 5 hrs if i run sequencially... (7 Replies)
Discussion started by: only4satish
7 Replies

4. Shell Programming and Scripting

[Solved] Running scripts in parallel that issue prompt

Hi all - I am totally stuck here :wall I have been asked to write a shell script that does a few little things and then reads from a config file and kicks off an instance of another script, say scriptB.ksh for each line in the config file. These should all be run in parallel. This is all fine but... (2 Replies)
Discussion started by: sjmolloy
2 Replies

5. Shell Programming and Scripting

Parallel processing in bash?

Hi Say I am interested in processing a big data set over shell, and each process individually takes a long time, but many such processes can be pipe-lined, is there a way to do this automatically or efficiently in shell? For example consider pinging a list addresses upto 5 times each. Of... (5 Replies)
Discussion started by: jamie_123
5 Replies

6. Shell Programming and Scripting

Execute scripts in Parallel

Hi I want to execute few scripts in Parallel. There is a Master Script (MS.ksh) which will call internally all the scripts we need to run in Parallel. Say there are three set of scripts : ABC_1.ksh --> ABC_2.ksh --> ABC_3.ksh (execute ABC_2 when ABC_1 is successful ; Execute ABC_3 only when... (6 Replies)
Discussion started by: dashing201
6 Replies

7. Shell Programming and Scripting

Find and execute shell scripts in multiple sub directories in parallel

I have one parent directory and within that parent directory there are several other sub-directories and within those sub-directories there are several other "large number" of sub-directories. All the sub directories have a shell script in them with a common file name execute_command.sh I want... (4 Replies)
Discussion started by: shoaibjameel123
4 Replies

8. Shell Programming and Scripting

Changing the Bash Scripts to Bourne Scripts:URGENT

Hi, I have to write a program to compute the checksums of files ./script.sh I wrote the program using bash and it took me forever since I am a beginner but it works very well. I'm getting so close to the deadline and I realised today that actually I have to use normal Bourne shell... (3 Replies)
Discussion started by: pgarg1989
3 Replies

9. Shell Programming and Scripting

Executing scripts in Parallel

Hi All, I have 3 shell scripts, Script1,Script2 and Script3. Now I want to run Script1 and Script2 in parallel and Script3 should depend on successful completion of both Script1 and Script2. Could you please suggest an approach of acheiving this... Thanks in advance (2 Replies)
Discussion started by: itsme_maverick
2 Replies

10. Shell Programming and Scripting

Running scripts in parallel

Hi, Iam having the scripts as follows. i jus want to run those in parallel. main aim is to minimise the time for overall execution of the script. now out.txt is having 1 lac records. script1(split.sh) split -1000 out.txt splitout ls -A splitout* > filelist.txt cat filelist.txt... (6 Replies)
Discussion started by: nivas
6 Replies
Login or Register to Ask a Question