Break up file into n number of subsets and run in parallel


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Break up file into n number of subsets and run in parallel
# 1  
Old 02-28-2014
Break up file into n number of subsets and run in parallel

Hi Guys,

I want to break down one of my input files into say 25 parts , run the same script in parallel and then merge the output into a single script.
I have access to computing resources that can deal with 25 files, if I just run the original file the total time is about 15 days every time.

Is this possible? So if I have an awk script gina.awk, these would be the steps.

1. Split Input.file into Input1.file, Input2.file,....Input25.file

2.
Code:
for file in Input*
do
./gina.awk $file > out_$file
done

3.
Code:
cat out* > Output.file

Is this possible? and will it help my cause in speeding up? I have access to 25 CPU cores.

Last edited by bartus11; 02-28-2014 at 02:44 PM.. Reason: Please use [code][/code] tags.
# 2  
Old 02-28-2014
Try:
Code:
LINES=`wc -l Input.file | awk '{print int($1/25)}'`
split -dl$LINES Input.file Input.file
mv Input.file /somewhere/else

for file in Input*; do 
  ./gina.awk $file > out_$file &
done

cat out* > Output.file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 02-28-2014
Is it just a really slow awk script? Or is there hundreds of gigabytes of data? Splitting it in 25 won't speed up a slow disk.

Does it make sense to split up the input data into sections? Is each line considered individually or does context matter?

Certainly it's possible to do what you want... Whether it's a good idea we don't know enough to say yet.
This User Gave Thanks to Corona688 For This Post:
# 4  
Old 02-28-2014
@bartus11

Don't we need a wait before the last statement?

--ahamed
# 5  
Old 02-28-2014
Quote:
Originally Posted by ahamed101
@bartus11

Don't we need a wait before the last statement?

--ahamed
Indeed, it might be useful. Alternatively, the OP could be checking the status of the jobs with:
Code:
jobs

And execute the last part manually when there are no more jobs executing.
# 6  
Old 02-28-2014
Yes, each line in the input is independent ..and the awk script does a series of greps from another file and writes to the output..
# 7  
Old 02-28-2014
It may be possible to speed up the awk script...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Linux

Inconsistency with parallel run

Hi All, I am running a parallel processing on aggregating a file. I am splitting the process into 7 separate parallel process and processing the same input file and the process will do the same for each 7 run. The issue I am having is for some reason the 1st parallel processes complete first... (7 Replies)
Discussion started by: arunkumar_mca
7 Replies

2. Shell Programming and Scripting

Run script in parallel in while loop

Hi I am running a loop which actually runs same script for different argument value passed to it. while read repID do echo "Starting for $repID"; date; perl process_report.pl $repID done<${FILE_TO_READ} However this runs in sequence. I want the loop to not to wait for perl to... (3 Replies)
Discussion started by: dashing201
3 Replies

3. Shell Programming and Scripting

Run the for loop in parallel

I have the below code which runs on multiple databases , but this runs one-after-one. I will need this to run in parallel so that i could save a lot of time. Please help!!! Thanks in advance for Db in `cat /var/opt/oracle/oratab |egrep -v "ASM" |grep -v \# |cut -d\: -f1` do { export... (5 Replies)
Discussion started by: jjoy
5 Replies

4. Windows & DOS: Issues & Discussions

To run job in parallel in batch

Hi, I am using a batch file to run 2 or more shutdown batch for each of my server like below: Shutdown_serverA.bat Shutdown_serverB.bat ... Is there anyway i can do this in parallel instead of serially:confused: ServerA & ServerB shutdown at the same time in one click (batch). (4 Replies)
Discussion started by: beginningDBA
4 Replies

5. Shell Programming and Scripting

Run a script in parallel

Hey, I am new to UNIX scripting . I have script (ex: start_script) that starts a job in 10 different servers one server after another.Now I want to modify the script so that the script starts the job in all servers parallely (at a time in all servers).and I need the choice of selecting the... (3 Replies)
Discussion started by: mpspsm
3 Replies

6. Shell Programming and Scripting

Run in series and Parallel

I have a list with four dates say load_date.lst contains 2010-01-01 2010-01-31 2010-03-01 2010-03-31 2010-05-01 2010-05-31 2010-07-01 2010-07-31 And I have directory /lll/src/sql with set of sql's 1_load.sql 2_load.sql 3_load.sql I want to run the sql'in series with respective to... (3 Replies)
Discussion started by: sol_nov
3 Replies

7. Shell Programming and Scripting

script - how to prevent in parallel run

I have one shell script which is being accessed by many jobs at same time. I want to make the script such that , other job should wait for the script if script is being used by some other job. Is there any way to implement it in script level ? Gops (1 Reply)
Discussion started by: Gopal_Engg
1 Replies

8. Shell Programming and Scripting

Run a command in parallel

Hi all, How do i run a command in parallel 50 times and capturing the result of each run in a separate file Eg: myApp arg1 > run1.txt myApp arg1 > run2.txt ::::::::::::::::::::::::: ::::::::::::::::::::::::: myApp arg1 > run50.txt The above way is sequential. ... (3 Replies)
Discussion started by: jakSun8
3 Replies

9. Programming

how to run prog bet to break points

Hi, I have set two break points at 500 and 572 lines respectively. after running prog using (gdb) run i m on the line 500 but how two go to second breakpoints ie line 572 . when i m giving (gdb) run it is asking again to run from starting lines . (1 Reply)
Discussion started by: useless79
1 Replies

10. Shell Programming and Scripting

How to run processes in parallel?

In a korn shell script, how can I run several processes in parallel at the same time? For example, I have 3 processes say p1, p2, p3 if I call them as p1.ksh p2.ksh p3.ksh they will run after one process finishes. But I want to run them in parallel and want to display "Process p1... (3 Replies)
Discussion started by: sbasak
3 Replies
Login or Register to Ask a Question