Copy files in Parallel


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Copy files in Parallel
# 1  
Old 01-22-2010
Question Copy files in Parallel

Is there any out of box Command in Unix
or is it possible through shell scripting to
copy all the files in a directory in Parallel.

Example. I am doing a COLD database backup. I have the data/dbx directory which has around 1000 data files.

I was thinking is there a way to spawn the copy process so that I would have say n number or say 4 threads which would copy 500 files each.

I am trying to cp the data directory to my backup folder.

right now I am doing

cp -Rf /data4/dbx /backup/dbx

dbx directory has around 1000 data files

If I use cp command , it only does that in sequence.
# 2  
Old 01-23-2010
Running 500 threads won't make your disk copy any faster unless your disk has 500 heads.
# 3  
Old 01-23-2010
For a local copy running 2-4 threads may make sense if you are copying from multiple underlying disks. Also it can squeeze latency and optimize queuing when using serially attached storage or if you are copying to a remote system.

However, if you are running multiple threads in a file system the downside is that the copy will contain rather fragmented files. This will pose no problem if the restore is single threaded, but then it will probably take longer than if you would have created a neat, unfragmented copy using a single thread.

It is not advisable to use multiple threads for a restore to single file system, since it would seriously fragment your filesystem and thus your database.
# 4  
Old 01-24-2010
Guys,
I do not want 500 threads.

My requirement is very simple.
If out of box command is not available
I need some kind of shell script which copies files in a directory in say 4 threads.
each thread will use cp commands on inidividual file sets .

This copy is being done on the same server but to a different directory.Its just a different san disk which is transparent to the system.

I am just copying files from /data/dbx directory to /backup/dbx directory

So script should do cp in say 4 sets of files.

Any idea how to script this.

---------- Post updated at 03:59 PM ---------- Previous update was at 03:54 PM ----------

I donot want to copy same file in multipli threads.

I want to copy sets of files in multiple threads.

So If directory has say 1000 files. I want to copy 250 different sets of files in each thread
using something like cp & .

script needs to be dynamic to get total number of files then issue a fixed number of 4 threads each thread having almost equal number of files to copy using cp &
# 5  
Old 01-24-2010
simonsimon, I am not talking about 500 threads. Did you read my answer about the pros and cons?

SAN means the disks are serially attached and probably there are multiple underlying physical disks so there are likely performance benefits for the backup process itself, however you should do a single thread restore..

For the parallell copy process you could probably use this as a basis (you would still have to create checks for directories and such, this is just the working principle):
Code:
SOURCEDIR="$1"
TARGETDIR="$2"
MAX_PARALLEL=4
nroffiles=$(ls $SOURCEDIR|wc -w)
setsize=$(( nroffiles/MAX_PARALLEL + 1 ))
ls -1 $SOURCEDIR/* | xargs -n $setsize | while read workset; do
  cp -p $workset $TARGETDIR &
done
wait


Last edited by Scrutinizer; 01-24-2010 at 06:34 PM..
# 6  
Old 01-24-2010
Scrutinizer,
You are simply great. Thanx for you suggestions. Ah I wish me as a DBA , to learn more about shell scripting and Unix admisnistration.

Ok Few Questions.

I traced your script to see what it does .And I have some Questions.
Directory /data/d1/ds has 4 files

Code:
SOURCEDIR="/data/d1/ds"
TARGETDIR="/data/d1/dt"
MAX_PARALLEL=2
nroffiles=$(ls $SOURCEDIR|wc -w)
setsize=$(( nroffiles/MAX_PARALLEL + 1 ))
ls -1 $SOURCEDIR/* | xargs -n $setsize | while read workset; do
  cp -p $workset $TARGETDIR &
done
wait

Code:
$ ct.sh
+ SOURCEDIR=/data/d1/ds
+ TARGETDIR=/data/d1/dt
+ MAX_PARALLEL=2
+ + ls /data/d1/ds
+ wc -w
nroffiles=4
+ setsize=3
+ read workset
+ xargs -n 3
+ ls -1 /data/d1/ds/f1.txt /data/d1/ds/f2.txt /data/d1/ds/f3.txt /data/d1/ds/f4.txt
+ read workset
+ cp -p /data/d1/ds/f1.txt /data/d1/ds/f2.txt /data/d1/ds/f3.txt /data/d1/dt
+ read workset
+ cp -p /data/d1/ds/f4.txt /data/d1/dt
+ wait

Can You kindly explain what does the following command do :-

Code:
setsize=$(( nroffiles/MAX_PARALLEL + 1 ))  :--  (( nroffiles/MAX_PARALLEL + 1 ))   gives only integer ? 

ls -1 $SOURCEDIR/* | xargs -n $setsize | while read workset; do
  cp -p $workset $TARGETDIR &
done
wait

ls -1 $SOURCEDIR/* :--- Gives file lsiting one below the other

xargs -n $setsize  :--- does what ? $setsize =3

How does it spawn two threads of cp command ?

What does the wait command do ? Does it wait for all parallel processes to finish.
So If I want to send email notification after all the files are copied a mailx command after wait command should do it ?

Last edited by pludi; 01-25-2010 at 02:10 AM.. Reason: code tags, please...
# 7  
Old 01-24-2010
Please use
Code:
CODE TAGS

as it makes easier to read the code.

$nroffiles - this number represents the number of files that would be there in each of the set that is to be executed in parallel.

Yes, after wait command - mail | mailx | sendmail to be used.

---------- Post updated at 09:25 AM ---------- Previous update was at 09:20 AM ----------

Am not clear with one thing, if parallelization needs to be achieved with count set to 'n'
Then at any point of time only 'n' cp process should be running and not the other way.
I think in this case, if total number of files is 'm' and parellization count 'k', then m / k process would be running in parallel which could be way too large than 'n' discussed above.
Code:
ls $srcdir | while read file
  cnt=1
  while [ $cnt -le 3 ]
  do
    # do some copy here as background jobs
    cnt=$(($cnt + 1))
  done
  wait
done

Am not sure, why
Code:
wc -w

is needed here.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to paste multiple files in parallel?

Hi all, I am trying to paste thousands of files together into a matrix. Each file has only 1 column and all the files have the same number of rows (~27k rows). I tried paste * > output as well as some other for loops but the output only contains the columns from the 1st and last files. The... (8 Replies)
Discussion started by: notimenocall
8 Replies

2. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies

3. Shell Programming and Scripting

Copy file from different ports in parallel

Hello folks, Can you please help me to solve the below concern. I have a source server with 2 ports and have to copy the files from both the port to destination server simultaneously in my shell script. How can I achieve that? Source : x.x.x.x port -22 X.x.x.x port -2222 ... (7 Replies)
Discussion started by: sadique.manzar
7 Replies

4. Shell Programming and Scripting

Parallel move keeping folder structure along with files in it

The below will move all the files in the directory dir to the destination using parallel and create a log, however will not keep them in the directory. I have tried mkdir -p but that does not seem to work or at least I can not seem to get it (as it deletes others files when I use it). What is the... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

Comparing list of files in parallel

Hi everyone. I have a list of files like: file001 file002 file003 . . . . file385 file386 file387 There are more files than above, but I hope you understand what I'm trying to do here. Is there a way I can create a loop to compare: file001 with file385 file002 with file386 (9 Replies)
Discussion started by: craigsky
9 Replies

6. Shell Programming and Scripting

Need to read two files in parallel

Following is the requirement In FileA I have the content as follows. 1,2,3 111,222,333 1000,2000,3000 In FileB I have the content as follows. 4,5,6 444,555,666 4000,5000,6000 I need to read FileA and FileB parallely and create the FileC as follows. 1,2,3,4,5,6... (1 Reply)
Discussion started by: kmanivan82
1 Replies

7. Shell Programming and Scripting

scp or rsync multiple files in parallel from a remote host

Hi. I'm trying to speed up an rsync command by running it in parallel. There's no real option for this other than if the files are in multiple directories (which they're not). And even then there's no way of knowing if rsync has succeeded as the process is running in the background .. and... (4 Replies)
Discussion started by: Big_Jeffrey
4 Replies

8. Shell Programming and Scripting

parallel excution for 2000 files.

Hi, I have a function abc(). i am calling the function 9 times. it is working fine and completed the script execution 10 hours. input files: CUSTOMER_INFO_1111_12345.csv CUSTOMER_INFO_1222_12345.csv CUSTOMER_INFO_1322_12345.csv CUSTOMER_INFO_1333_12345.csv CUSTOMER_INFO_1151_12345.csv... (4 Replies)
Discussion started by: onesuri
4 Replies

9. UNIX for Advanced & Expert Users

implementation of copy command in parallel

hey i have to implement copy command in parallel in c language. i dont know how to create a new directory in destination. if anything u know related to this help me (1 Reply)
Discussion started by: rajsekhar28
1 Replies

10. Shell Programming and Scripting

split process files in parallel and rejoin

Hi I need to split a huge file into multiple smaller files using split command. After that i need to process each file in the back ground with sql loader .Sql loader is a utlity to load CSV files into oracle . Check the status of each of these sqlloaders and then after sucessfull... (6 Replies)
Discussion started by: xiamin
6 Replies
Login or Register to Ask a Question