Using UNIX Commands with Larger number of Files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Using UNIX Commands with Larger number of Files
# 15  
Old 11-10-2012
Quote:
Originally Posted by RudiC
May I advise against executing a script within a find command? This would include, for every file found, creating a shell to run the script, implying a huge overhead, esp. in this case with "Large Number of Files". Why not collect all filenames in a working file and then work on that?
The whole point of my suggestion to use a script was to use the script to rearrange arguments to be passed to a simple command thereby allowing find's -exec primary to exec the underlying UNIX command with a large number of file operands instead of with a single file operand. I tried an experiment using the following commands on my relatively old MacBook Pro running Mac OS X Version 10.7.5, which sets {ARG_MAX} to 262144 bytes (i.e., 256Kb). I used cp as the test command to copy all of the PDF files found in and under my home directory to /tmp/pdfdest. /tmp/pdfdest was a new directory when I started this test, but I did not empty and recreate the directory between tests.

I used the command:
Code:
time find $HOME -name '*.pdf' -exec cp -f {} /tmp/pdfdest \;

three times and ignored the 1st time (which ran a lot slower than the other two as it cleared out the various caches and loaded my home directory's file hierarchy). The two remaining runs averaged 2 minutes 27.45 seconds wall clock time, 0.85 seconds user clock time, and 5.56 seconds system time to copy 881 files by invoking cp 881 times.

Adding the following Korn shell script (named CpDest1st):
Code:
#!/bin/ksh
# Usage: CpDest1st options destdir srcfile...
opts="$1"
dest="$2"
shift 2
exec cp $opts "$@" "$dest"

and using the command:
Code:
time find $HOME -name '*.pdf' -exec CpDest1st -f /tmp/pdfdest {} +

three times and again ignored the first set of results and averaged the other two sets of results. Copying the same 881 files using this method invoking the shell script and the cp utility once each took 2 minutes 3.02 seconds wall time, 0.42 seconds user time, and 3.76 seconds system time.

This isn't a statistically valid comparison and your mileage will vary depending on the value of {ARG_MAX} on your system, the number and sizes of the PDF files being copied, ... . But, it does show that the overhead of using a shell script to reduce the number of invocations of cp (or mv or many other utilities) may actually reduce the elapsed time and the amount of system resources used. In this simple test, using the shell script reduced wall clock time 16%, reduced system time 50%, and reduced user time 32%.
# 16  
Old 11-11-2012
Interesting result. Hmmm. I'm not sure that cp was exec'ed 881 times in the second case; the + sign will collect all file names as parameters, which you allow for using the @ sign in your script. Could you set the -x shell option and show us what happens?
# 17  
Old 11-11-2012
Quote:
Originally Posted by RudiC
Interesting result. Hmmm. I'm not sure that cp was exec'ed 881 times in the second case; the + sign will collect all file names as parameters, which you allow for using the @ sign in your script. Could you set the -x shell option and show us what happens?
I must not have been clear about what happened.

In the 1st case cp was invoked 881 times to copy 881 files. In the 2nd case, the shell script CpDest1st was invoked 1 time and it invoked cp 1 time to copy the same 881 files. The cp command line invoked was 65190 bytes long. Even though it is just a list of PDF files preceded by cp -f and ending with /Users/dwc/tc1-2008detailed.pdf /Users/dwc/tc1-2008summary.pdf /tmp/pdfdest, I'm not comfortable posting the names of the other 879 PDF files both because this forum is not set up to handle a 65000+ character line length and because the directory structure and file names of files under my home directory is nobody's business but mine.
# 18  
Old 11-12-2012
That's what I thought. If in the first example you invoke cp once with 881 filenames, then you compare apples with apples. find needs the {} to appear just in front of the +, so it does not work out for every command. Fortunately cp has the -t option. So it'd be interesting to invoke it like
Code:
time find $HOME -name '*.pdf' -exec cp -f -t /tmp/pdfdest {} +

This User Gave Thanks to RudiC For This Post:
# 19  
Old 11-12-2012
Quote:
Originally Posted by RudiC
That's what I thought. If in the first example you invoke cp once with 881 filenames, then you compare apples with apples. find needs the {} to appear just in front of the +, so it does not work out for every command. Fortunately cp has the -t option. So it'd be interesting to invoke it like
Code:
time find $HOME -name '*.pdf' -exec cp -f -t /tmp/pdfdest {} +

Yes, it would. However, cp's -t option is an extension to the standards and is not available on many systems, including OS X (which is the OS I use when testing suggestions I submit to this site).

If your system provides the -t option to cp, use it. It will be faster than using a shell script to rearrange the arguments to move the destination directory to the end of the operand list. If you need a portable solution that will work with all implementations of cp, using a shell script to rearrange the operands may be faster than invoking cp for each file to be moved. Where the tipping point is will vary based on the number of and sizes of the files being copied, and on many factors that will vary from system to system including the number of active users and what they are doing, the value of {ARG_MAX}, the I/O bandwidth, the types of file systems and the underlying hardware used for the source and destination directories, etc.
This User Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with Expect script for pulling log files size larger than 500Mb;

I am new at developing EXPECT scripts. I'm trying to create a script that will automatically connect to a several UNIX (sun solaris and HPUX) database server via FTP and pull the sizes of the listener/alert log files from specified server directory on the remote machines. 1. I want the script... (7 Replies)
Discussion started by: mikebantor
7 Replies

2. UNIX for Beginners Questions & Answers

Need to select files larger than 500Mb from servers

I need help modifying these two scripts to do the following: - print files in (MB) instead of (KB) - only select files larger than 500MB -> these will be mailed out daily - Select all files regardless of size all in (MB) -> these will be mailed out once a week this is what i have so far and... (5 Replies)
Discussion started by: donpasscal
5 Replies

3. UNIX for Dummies Questions & Answers

Split larger files into smaller ones with Column names

Hi, I have one large files of 100000 rows with header column. Eg: Emp Code, Emp Name 101,xxx 102,YYY 103,zzz ... ... I want to split the files into smaller files with only 30000 rows each..File 1,2 and 3 must have 30000 rows and file 4 must contain 10000 rows. But the column... (1 Reply)
Discussion started by: Nivas
1 Replies

4. Shell Programming and Scripting

Backingup larger files with TAR command

I need to backup my database but the files are very large and the TAR command will not let me. I searched aids and found that I could do something with the mknod, COMPRESS and TAR command using them together. I appreciate your help. (10 Replies)
Discussion started by: frizcala
10 Replies

5. UNIX for Dummies Questions & Answers

7z command for files larger than 4GB ( unzip doesn't work)

My unzip command doesn't work for files that are greater than 4GB. Consider my file name is unzip -p -a filename.zip, the command doesn't work since the size of the file is larger. I need to know the corresponding 7z command for the same. This is my Unix shell script program: if then ... (14 Replies)
Discussion started by: chandraprakash
14 Replies

6. Programming

Using basic UNIX commands to make/compile JAVA files

Hello! This is my first post, and I just learned what UNIX was this week. For a JAVA programming class I am taking, I must be able to create a directory in UNIX, use the nano command to create a JAVA program, compile it, and then run it on the command prompt using the java command. For some... (5 Replies)
Discussion started by: UNdvoItX
5 Replies

7. UNIX for Dummies Questions & Answers

unix commands related to ftp of files..

Hi, I am new to unix , I was planning to write a script that will FTP files to destination folder. , Please guide me what are the various networking commands that unix will help in this ftp process..?:confused: (1 Reply)
Discussion started by: rahul125
1 Replies

8. AIX

Tar files larger than 2GB

Hi, Does anyone know if it is possible to tar files larger than 2GB? The reason being is they want me to dump a single file (which is around 20GB) to a tape drive and they will restore it on a Solaris box. I know the tar have a limitation of 2GB so I am thinking of a way how to overcome this.... (11 Replies)
Discussion started by: depam
11 Replies

9. Shell Programming and Scripting

How to initialize array with a larger number?

Language: ksh OS: SunOS I have been getting the 'subscript out of range' error when the below array variable gets elements greater that 1024. I understand that 1024 is the default size for 'set -A' dynamic array, but is there a way to initialize it with a larger number? set -A arr `grep... (6 Replies)
Discussion started by: ChicagoBlues
6 Replies

10. UNIX for Advanced & Expert Users

sending larger files via ftp

hi all, i am looking for ways to make ftp efficient by tuning the parameters currently, tcp_max_buf is 1 MB tcp_xmit_hiwat is 48 KB say to transmit multiple 2 gb files from unix server to mainframe sys, will increasing the window size or the send buffer size of the current TCP/IP... (6 Replies)
Discussion started by: matrixmadhan
6 Replies
Login or Register to Ask a Question