Very Challenging :Copy files in Multiple Threads


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Very Challenging :Copy files in Multiple Threads
# 1  
Old 02-28-2010
Question Very Challenging :Copy files in Multiple Threads

Hello all,

I asked this in the basic Unix forum got no answer since one week.

So I believe this is an advanced level question hence posting it here.


Any suggestions welcome.

I have a directory of files of varying sizes.

I want to copy all these files in n number of threads to another directory such that each

copy set is more or less the same size.

Example :

Say /mydirA

It has around say 23 files of various sizes.

Number of copy threads say = 3

total directory size of /mydirA = 25 GB

So each thread should copy files whose sizes add up to almost 25/3 ~ 8 GB

So need to gather files based on the size for each thread such that they add upto 8GB

Thread 1 --> 8GB ..could be 11 files which add up to 8 gb

Thread 2 -->8 Gb ... couldbe 5 files which add up to 8 gb

Thread 3 ---> 9GB ...could be 8 files which add up to 8 or 9 gb

Want roughly equal copy set threads. It is also possible that even though I select 3 threads of equal size because of lack of number of files not all 3 threads could satisy the 8gb copy set size. So atleast try to fulfill the copy set thread size as far as possible.

All files need to go from /mydirA to /mydirB in N threads bases on the size of each thread as
(Total size of directory)/N which could have different number of files in each thread based on size to add up to the individual copy thread size
Image
# 2  
Old 02-28-2010
In short: good luck solving an NP-complete problem.

Besides that, for any large copy operation the bottleneck will be the IO subsystem (disk, network, ...) rather than any CPU.
# 3  
Old 02-28-2010
Hi Samoo,

I have come up with a script for your requirement. When tested with sample files with same size it worked fine in deviding all files into 3 sets of equal size.

logic used :

Till $totsize variable is less the reference (1/3 of total size) we are appening each file name to a string variable.
once it exceeds, we are appending that file name also to the string variable and exiting.
This ensures the total size of the all the files in the string (filelist1=$filelist1:$i) slightly great than reference size.

Once we get the thread1 files,

we are excluding these files from sizefiles list (i.e total files list in current directory) and proceeding with the remaining files with the same logic to get the second thread files....

Finally remaining will come under thread3 files.

I am not sure how this script will work in real time senerio (i.e files of different sizes). How ever this may give you some idea how to proceed further.



Current directory files

Code:
$ls -l
-rw-r--r--   1 userid   staff           166 Mar 01 04:21 f1
-rw-r--r--   1 userid   staff           166 Mar 01 04:21 f2
-rw-r--r--   1 userid   staff           166 Mar 01 04:21 f3
-rw-r--r--   1 userid   staff           166 Mar 01 04:21 f4
-rw-r--r--   1 userid   staff           166 Mar 01 04:21 f5
-rw-r--r--   1 userid   staff           166 Mar 01 04:22 f6
-rw-r--r--   1 userid   staff           166 Mar 01 04:22 f7
-rw-r--r--   1 userid   staff           166 Mar 01 04:22 f8
-rw-r--r--   1 userid   staff           166 Mar 01 04:22 f9
-rwxr-xr-x   1 userid   staff          2514 Mar 01 05:21 seperate.sh


Code:
Script seperate.sh

# Deleting previously created files by this script is any.......
rm -r dir1 dir2 dir3 list* size* thread* >/dev/null 2>/dev/null
# Preparing a file containing all the file names and their corrosponding sizes in the current directory
ls -l |grep -v "dr--*"|grep -v "total"|grep -v $0|awk '{print $9" :" $5}' >sizelist
 

                        ##################### PART1 #########################
# Calculating the total files size in current directory and taking 1/3 of it as reference for getting files for thread 1,2,3
totsize=0
for i in `cat sizelist|awk -F: '{print $2}'`
do
((totsize=$totsize+$i))
done
((refsize=$totsize/3))
 

                        ##################### PAR2 #########################
# Preparing a list of thread1 files
filelist1=" "
thread1size=0
for i in `cat sizelist|awk -F: '{print $1}'`
do
filesize=`cat sizelist|grep "$i"|awk -F: '{print $2}'`
((thread1size=$thread1size+$filesize))
if [ $thread1size -lt $refsize ]
then
filelist1=$filelist1:$i
else
filelist1=$filelist1:$i
break
fi
done
echo $filelist1 |tr -s : " " >thread1
 
                       ##################### PART3 ##########################
# Preparing a file containing  list of filenames excluding the thread1 files for getting thread2 files
cat sizelist >list2
for i in `cat thread1`
do
cat list2|grep -v $i >list2
done
# Preparing a list of thread2 files
filelist2=" "
threadr2size=0
for i in `cat list2|awk -F: '{print $1}'`
do
filesize=`cat list2|grep "$i"|awk -F: '{print $2}'`
((thread2size=$thread2size+$filesize))
if [ $thread2size -lt $refsize ]
then
filelist2=$filelist2:$i
else
filelist2=$filelist2:$i
break
fi
done
echo $filelist2 |tr -s : " " >thread2
 
                       ##################### PART4 #############################
# Preparing list of remaining files for thread3
#echo $thread2
cat list2 >thread3
for i in `cat thread2`
do
cat thread3|grep -v $i|awk -F: '{print $1}'>thread3
done
 
                       ##################### PART5 #############################
#creating three directories and coping thread1 and thread2 and thread3 files to them
mkdir dir1 dir2 dir3
for i in `cat thread1`
do
cp $i dir1/
done
for i in `cat thread2`
do
cp $i dir2/
done
for i in `cat thread3`
do
cp $i dir3/
done
echo "Total size of all files in the current directory is $totsize"
echo "The reference size is 1/3 is $refsize"
echo "Thread one files are: `cat thread1`"
echo "Thread two files are: `cat thread2`"
echo "Thread three files are: `cat thread3`"


Output :
Code:
$ seperate.sh

Total size of all files in the current directory is 1494
The reference size is 1/3 is 498
Thread one files are:  f1 f2 f3
Thread two files are:  f4 f5 f6
Thread three files are: f7
f8
f9


Last edited by spider007; 02-28-2010 at 08:42 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy data at specified location from multiple files

Hello everyone, Im super new to coding but increasingly in need of it at work. Im have task stacked because of this problems, that I cannot figure out how to solve looking on the internet after trying many many things that looked similar to me. I have multiple data files of the form (see below).... (2 Replies)
Discussion started by: Xfiles_fan
2 Replies

2. Shell Programming and Scripting

Copy files matching multiple conditions

Hello How do i copy files matching multiple conditions. Requirement is to search files starting with name abc* and def* and created on a particular date or date range given by the user and copy it to the destination folder. i tried with different commands. below one will give the list ,... (5 Replies)
Discussion started by: NarayanaPrakash
5 Replies

3. Shell Programming and Scripting

Copy multiple files from A to B through passwordless ssh

hi all, I need to write one script to copy multiple imp files like /etc/passwd /etc/group /etc/shadow /etc/printers.conf from system A, System B and system C to system Z and I need to execute this script on System Z. like if system is equal A copy 1 2 3 files to system Z into... (9 Replies)
Discussion started by: manalisharmabe
9 Replies

4. UNIX for Dummies Questions & Answers

copy multiple files

Hi, I am facing this problem, however i am not finding any solution. Kindly help I have the list of files to be search , i need to search for those files and copy the files to a folder. Really its urgent. MG_0281.JPG Tdfa_0077.JPG The%20SirehSet%20Geduing%20KpgGlam%20.jpg... (4 Replies)
Discussion started by: umapearl
4 Replies

5. UNIX for Dummies Questions & Answers

Copy multiple files with space to folder

Please help , I am in an urgent need, Please help nawk '{for(i=1;i<=NF;i++){printf("%s\n",$i)}}' filename | sed 's/.*com//' | nawk '/pdf/ {printf("F:%s\n",$0)}' | while read line; do mv $line /images/; done the above script works for without spaces but,My path is also having some space... (3 Replies)
Discussion started by: umapearl
3 Replies

6. UNIX for Dummies Questions & Answers

Zip multiple files and copy to help

Hi All, I have a set of large files ~ 500_900Mb I have generated and I'd like to quickly zip and copy them to a new folder elsewhere ... Can anyone suggest a quicky ?? Cheers :) (3 Replies)
Discussion started by: pawannoel
3 Replies

7. Shell Programming and Scripting

ksh to copy multiple files

Guys, I've got a list of about 200 files I need to copy from /tmp to /data. I can't use wildcards because the filenames are all very different. What I want to do is cut and paste them into a file and read that as the input to a copy command (line by line). I tried using find and -exec... (4 Replies)
Discussion started by: Grueben
4 Replies

8. UNIX for Dummies Questions & Answers

Copy multiple files

Hi i have 1000 files is a directory, which are serially numbered (file1,file2,file3...). I would like to copy every 200 files to different directories. many thanks in advance. (6 Replies)
Discussion started by: saint2006
6 Replies

9. UNIX for Dummies Questions & Answers

Copy files in Multiple Threads

Hello all, I have a directory of files of varying sizes. I want to copy all these files in n number of threads to another directory such that each copy set is more or less the same size. Example : Say /mydirA It has around say 23 files of various sizes. Number of copy... (0 Replies)
Discussion started by: samoo
0 Replies

10. UNIX for Dummies Questions & Answers

copy multiple files in different directories

I have a report file that is generated every day by a scheduled process. Each day the file is written to a directory named .../blah_blah/Y07/MM-DD-YY/reportmmddyy.tab I want to copy all of this reports to a separate directory without having to do it one by one. However, if I try cp... (3 Replies)
Discussion started by: ken2834
3 Replies
Login or Register to Ask a Question