Fast processing(mv command) of 1 million+ files using find, mv and xargs


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fast processing(mv command) of 1 million+ files using find, mv and xargs
# 1  
Old 04-03-2013
Fast processing(mv command) of 1 million+ files using find, mv and xargs

Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another:

Code:
find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir

I learned this line of code from this forum as well and it works fine. However, file movement is kinda slow; about 1-2 files per second. At this rate, it may take days to move the files. I have not much background yet about xargs, so I was wondering if there could be a faster way to accomplish this process.

Here are some more details:
-OS is HP-UX.
-The files in /source/dir are continually being added.
-Size per file is around 300-1000kb.
-Filename pattern includes YYYYMMDD date (might prove useful for batch processing).
-The mv* command is already encountering "arg list too long," hence the use of find/xargs.
-/source/dir has no sub directories.
-After moving the files, I would later divide/mv then into different dirs corresponding to their YYYYMMDD date.

Hope the above info helps. Any advise would be greatly appreciated as well.

Thank you.
# 2  
Old 04-03-2013
Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 04-03-2013
Quote:
Originally Posted by RudiC
Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.
Hi RudiC, yup, same file system. It would have been easier indeed to just rename the dir. Smilie

My main goal is just to redistribute the files to different dirs according to their YYYYMMDD. The mv to /destination/dir is just an extra step since we have more space there.
# 4  
Old 04-03-2013
Are you sure it's one to two files per second?
Code:
$ ls | wc
   1993    1993   34863

Code:
$ time find . -name "*.*" -type f | xargs -I '{}' mv {} ../xxx
real    0m2.846s
user    0m0.668s
sys     0m2.104s

Code:
$ cd ../xxx
$ ls | wc
   1993    1993   34863

Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think. Smilie

At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.
This User Gave Thanks to hanson44 For This Post:
# 5  
Old 04-03-2013
Quote:
Originally Posted by hanson44
Are you sure it's one to two files per second?
Code:
$ ls | wc
   1993    1993   34863

Code:
$ time find . -name "*.*" -type f | xargs -I '{}' mv {} ../xxx
real    0m2.846s
user    0m0.668s
sys     0m2.104s

Code:
$ cd ../xxx
$ ls | wc
   1993    1993   34863

Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think. Smilie

At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.
H hanson44, yup, around 2 files per sec. Smilie
I have a counter on /destination/dir that executes ls | wc -l every 2 sec just so I could check the progress.
I'm thinking that since /source/dir already contains 1.2 million files (and still receiving more from an auto-dump script), it contributes to the slow processing.
# 6  
Old 04-03-2013
Yes, perhaps the find command is the bottleneck. Maybe it has a hard time "dealing with" so many files.

What happens if you run the find command for a minute (forget about the mv part for the time being), saving the output from find to a file, and see how many lines accumulate in the file?

If there are perhaps 120 lines after a minute (two per second), then find is the bottleneck. If there are tens of thousands of lines, then it's still a mystery.

Is there any chance you could just use 'ls' instead of find?
# 7  
Old 04-03-2013
ls is guaranteed to perform badly here, because it must read the entire directory list and sort their names before it can print. It might bog for minutes or hours until it shows anything.

find doesn't have problems "dealing with" large numbers of files. In a sense find's job is rather simple -- opendir(), readdir(), print if match, loop until done. If it's struggling, that means it either has too much work to do -- finding 300 'good' files out of 1.2 million files you don't care about means scanning through all 1.2 million -- or the filesystem itself is responding slowly.

Small numbers of folders crammed full of millions of files generally perform rather badly, especially when already busy. The filesystem itself, rather than find, may be suffering here.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Zip million files taking 12 hours or more

Hi I have task to zip files based on modified time but they are in millions and it is taking lot of time more than 12 hours and also eating up high cpu is there any other / better way to handle it quickly with less cpu consumptionfind . ! -name \"*.gz\" -mtime +7 -type f | grep -v '/.*/' |... (2 Replies)
Discussion started by: reldb
2 Replies

2. Shell Programming and Scripting

Parallel processing for functions in xargs

I have a script (ksh) which tries to run a function in parallel for performance gains. I am also trying to limit the number of parallel child processes to avoid overloading the system by using a variable to count triggered processes and waiting for completion e.g. do_something () { ... } ... (9 Replies)
Discussion started by: jawsnnn
9 Replies

3. UNIX for Dummies Questions & Answers

Deleting a million of files ..

Hi, Which way is faster rm -rf /path/ or find / -name -exec rm {} \; and why? (7 Replies)
Discussion started by: cain82
7 Replies

4. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

these are numeric ids.. 222932017099186177 222932014385467392 222932017371820032 222932017409556480 I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way.. sort | uniq -d will... (3 Replies)
Discussion started by: pamu
3 Replies

5. Shell Programming and Scripting

help using find/xargs to apply mp3gain to files

I need to apply mp3gain (album mode) to all mp3 files in a given directory. Each album is in its own directory under /media/data/music/albums for example: /media/data/music/albums/foo /media/data/music/albums/bar /media/data/music/albums/more What needs to happen is: cd... (4 Replies)
Discussion started by: audiophile
4 Replies

6. Solaris

Need to know command to delete more than 3 million files from /var/spool/clientmqueue

Hi I need to delete more than 3 million files from /var/spool/clientmqueue. When I give the following command to delete the files, I get the error # pwd /var/spool/clientmqueue # rm -f * /usr/bin/rm: arg list too long Please tell me how can I delete the files (5 Replies)
Discussion started by: sb200
5 Replies

7. Shell Programming and Scripting

find with xargs to rm found files

I believe what is happening is rm is executing in the script on every directory and on failure of the first it stops although returns status 0. find $HOME -name /directory/filename | xargs -l rm This is the code I use but file remains. I am using sun solaris system which has way limited... (4 Replies)
Discussion started by: Ebodee
4 Replies

8. UNIX for Dummies Questions & Answers

use of xargs and prune piping with find command.

Can anyone interpret and tell me the way the below command works? find * -name "*${msgType}" -mtime +${archiveDays} -prune -type f -print 2>/dev/null | xargs rm -f 2> /dev/null Please tell me the usage of prune and xargs in the above command? Looking forward your reply. Thanks in... (1 Reply)
Discussion started by: venkatesht
1 Replies

9. AIX

command usage on find with xargs and tar

my task : tar up large bunch of files(about 10,000 files) in the current directories that created more than 30 days ago but it come with following error find ./ -ctime +30 | xargs tar rvf test1.tar tar: test1.tar: A file or directory in the path name does not exist. (3 Replies)
Discussion started by: darkrainbow
3 Replies
Login or Register to Ask a Question