Fast processing(mv command) of 1 million+ files using find, mv and xargs
Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another:
I learned this line of code from this forum as well and it works fine. However, file movement is kinda slow; about 1-2 files per second. At this rate, it may take days to move the files. I have not much background yet about xargs, so I was wondering if there could be a faster way to accomplish this process.
Here are some more details:
-OS is HP-UX.
-The files in /source/dir are continually being added.
-Size per file is around 300-1000kb.
-Filename pattern includes YYYYMMDD date (might prove useful for batch processing).
-The mv* command is already encountering "arg list too long," hence the use of find/xargs.
-/source/dir has no sub directories.
-After moving the files, I would later divide/mv then into different dirs corresponding to their YYYYMMDD date.
Hope the above info helps. Any advise would be greatly appreciated as well.
Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.
Were /source/dir and /destination/dir on the same file system, mv would just rename the files and not move a single byte.
Hadn't you said new files drop in constantly, I'd have tried to rename the directory itself (same file system!).
The way it is, I'm afraid I'm out of ideas.
Hi RudiC, yup, same file system. It would have been easier indeed to just rename the dir.
My main goal is just to redistribute the files to different dirs according to their YYYYMMDD. The mv to /destination/dir is just an extra step since we have more space there.
Are you sure it's one to two files per second?
Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think.
At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.
Are you sure it's one to two files per second?
Seems like about 700 files per second. And this is running on kind of a dog of a linux computer, nothing special. Unless your find command is taking days, maybe your operations are going faster than you think.
At 500 files per second, you could mv a million files in 2000 seconds, about 30 minutes.
H hanson44, yup, around 2 files per sec.
I have a counter on /destination/dir that executes ls | wc -l every 2 sec just so I could check the progress.
I'm thinking that since /source/dir already contains 1.2 million files (and still receiving more from an auto-dump script), it contributes to the slow processing.
Yes, perhaps the find command is the bottleneck. Maybe it has a hard time "dealing with" so many files.
What happens if you run the find command for a minute (forget about the mv part for the time being), saving the output from find to a file, and see how many lines accumulate in the file?
If there are perhaps 120 lines after a minute (two per second), then find is the bottleneck. If there are tens of thousands of lines, then it's still a mystery.
Is there any chance you could just use 'ls' instead of find?
ls is guaranteed to perform badly here, because it must read the entire directory list and sort their names before it can print. It might bog for minutes or hours until it shows anything.
find doesn't have problems "dealing with" large numbers of files. In a sense find's job is rather simple -- opendir(), readdir(), print if match, loop until done. If it's struggling, that means it either has too much work to do -- finding 300 'good' files out of 1.2 million files you don't care about means scanning through all 1.2 million -- or the filesystem itself is responding slowly.
Small numbers of folders crammed full of millions of files generally perform rather badly, especially when already busy. The filesystem itself, rather than find, may be suffering here.
Hi
I have task to zip files based on modified time but they are in millions and it is taking lot of time more than 12 hours and also eating up high cpu
is there any other / better way to handle it quickly with less cpu consumptionfind . ! -name \"*.gz\" -mtime +7 -type f | grep -v '/.*/' |... (2 Replies)
I have a script (ksh) which tries to run a function in parallel for performance gains. I am also trying to limit the number of parallel child processes to avoid overloading the system by using a variable to count triggered processes and waiting for completion e.g.
do_something ()
{
...
}
... (9 Replies)
these are numeric ids..
222932017099186177
222932014385467392
222932017371820032
222932017409556480
I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way..
sort | uniq -d will... (3 Replies)
I need to apply mp3gain (album mode) to all mp3 files in a given directory. Each album is in its own directory under /media/data/music/albums for example:
/media/data/music/albums/foo
/media/data/music/albums/bar
/media/data/music/albums/more
What needs to happen is:
cd... (4 Replies)
Hi
I need to delete more than 3 million files from /var/spool/clientmqueue. When I give the following command to delete the files, I get the error
# pwd
/var/spool/clientmqueue
# rm -f *
/usr/bin/rm: arg list too long
Please tell me how can I delete the files (5 Replies)
I believe what is happening is rm is executing in the script on every directory and on failure of the first it stops although returns status 0.
find $HOME -name /directory/filename | xargs -l rm
This is the code I use but file remains. I am using sun solaris system which has way limited... (4 Replies)
Can anyone interpret and tell me the way the below command works?
find * -name "*${msgType}" -mtime +${archiveDays} -prune -type f -print 2>/dev/null | xargs rm -f 2> /dev/null
Please tell me the usage of prune and xargs in the above command?
Looking forward your reply.
Thanks in... (1 Reply)
my task : tar up large bunch of files(about 10,000 files) in the current directories that created more than 30 days ago
but it come with following error
find ./ -ctime +30 | xargs tar rvf test1.tar
tar: test1.tar: A file or directory in the path name does not exist. (3 Replies)