Fast processing(mv command) of 1 million+ files using find, mv and xargs

04-03-2013

Registered User

11, 1

Join Date: May 2011

Last Activity: 3 April 2013, 4:50 AM EDT

Location: Philippines

Posts: 11

Thanks Given: 8

Thanked 1 Time in 1 Post

Fast processing(mv command) of 1 million+ files using find, mv and xargs

Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another:

Code:

find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir

I learned this line of code from this forum as well and it works fine. However, file movement is kinda slow; about 1-2 files per second. At this rate, it may take days to move the files. I have not much background yet about xargs, so I was wondering if there could be a faster way to accomplish this process.

Here are some more details:
-OS is HP-UX.
-The files in /source/dir are continually being added.
-Size per file is around 300-1000kb.
-Filename pattern includes YYYYMMDD date (might prove useful for batch processing).
-The mv* command is already encountering "arg list too long," hence the use of find/xargs.
-/source/dir has no sub directories.
-After moving the files, I would later divide/mv then into different dirs corresponding to their YYYYMMDD date.

Hope the above info helps. Any advise would be greatly appreciated as well.

Thank you.

agentgrecko

View Public Profile for agentgrecko

Find all posts by agentgrecko

04-03-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

04-03-2013

Registered User

11, 1

Join Date: May 2011

Last Activity: 3 April 2013, 4:50 AM EDT

Location: Philippines

Posts: 11

Thanks Given: 8

Thanked 1 Time in 1 Post

Quote:

Originally Posted by RudiC

Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.

Hi RudiC, yup, same file system. It would have been easier indeed to just rename the dir.

My main goal is just to redistribute the files to different dirs according to their YYYYMMDD. The mv to /destination/dir is just an extra step since we have more space there.

agentgrecko

View Public Profile for agentgrecko

Find all posts by agentgrecko

04-03-2013

Registered User

858, 184

Join Date: Mar 2013

Last Activity: 12 May 2013, 11:33 PM EDT

Posts: 858

Thanks Given: 18

Thanked 184 Times in 179 Posts

Are you sure it's one to two files per second?

Code:

$ ls | wc
   1993    1993   34863

Code:

$ time find . -name "*.*" -type f | xargs -I '{}' mv {} ../xxx
real    0m2.846s
user    0m0.668s
sys     0m2.104s

Code:

$ cd ../xxx
$ ls | wc
   1993    1993   34863

Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think.

At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.

This User Gave Thanks to hanson44 For This Post:

hanson44

View Public Profile for hanson44

Find all posts by hanson44

04-03-2013

Registered User

11, 1

Join Date: May 2011

Last Activity: 3 April 2013, 4:50 AM EDT

Location: Philippines

Posts: 11

Thanks Given: 8

Thanked 1 Time in 1 Post

Quote:

Originally Posted by hanson44

Are you sure it's one to two files per second?

Code:

$ ls | wc
   1993    1993   34863

Code:

$ time find . -name "*.*" -type f | xargs -I '{}' mv {} ../xxx
real    0m2.846s
user    0m0.668s
sys     0m2.104s

Code:

$ cd ../xxx
$ ls | wc
   1993    1993   34863

Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think. Smilie

At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.

H hanson44, yup, around 2 files per sec.

I have a counter on /destination/dir that executes ls | wc -l every 2 sec just so I could check the progress.
I'm thinking that since /source/dir already contains 1.2 million files (and still receiving more from an auto-dump script), it contributes to the slow processing.

agentgrecko

View Public Profile for agentgrecko

Find all posts by agentgrecko

04-03-2013

Registered User

858, 184

Join Date: Mar 2013

Last Activity: 12 May 2013, 11:33 PM EDT

Posts: 858

Thanks Given: 18

Thanked 184 Times in 179 Posts

Yes, perhaps the find command is the bottleneck. Maybe it has a hard time "dealing with" so many files.

What happens if you run the find command for a minute (forget about the mv part for the time being), saving the output from find to a file, and see how many lines accumulate in the file?

If there are perhaps 120 lines after a minute (two per second), then find is the bottleneck. If there are tens of thousands of lines, then it's still a mystery.

Is there any chance you could just use 'ls' instead of find?

hanson44

View Public Profile for hanson44

Find all posts by hanson44

04-03-2013

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

ls is guaranteed to perform badly here, because it must read the entire directory list and sort their names before it can print. It might bog for minutes or hours until it shows anything.

find doesn't have problems "dealing with" large numbers of files. In a sense find's job is rather simple -- opendir(), readdir(), print if match, loop until done. If it's struggling, that means it either has too much work to do -- finding 300 'good' files out of 1.2 million files you don't care about means scanning through all 1.2 million -- or the filesystem itself is responding slowly.

Small numbers of folders crammed full of millions of files generally perform rather badly, especially when already busy. The filesystem itself, rather than find, may be suffering here.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

Shell Programming and Scripting

Fast processing(mv command) of 1 million+ files using find, mv and xargs

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Zip million files taking 12 hours or more

Discussion started by: reldb

2. Shell Programming and Scripting

Parallel processing for functions in xargs

Discussion started by: jawsnnn

3. UNIX for Dummies Questions & Answers

Deleting a million of files ..

Discussion started by: cain82

4. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

Discussion started by: pamu

5. Shell Programming and Scripting

help using find/xargs to apply mp3gain to files

Discussion started by: audiophile

6. Solaris

Need to know command to delete more than 3 million files from /var/spool/clientmqueue

Discussion started by: sb200

7. Shell Programming and Scripting

find with xargs to rm found files

Discussion started by: Ebodee

8. UNIX for Dummies Questions & Answers

use of xargs and prune piping with find command.

Discussion started by: venkatesht

9. AIX

command usage on find with xargs and tar

Discussion started by: darkrainbow