Increase the performance of find command.


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Increase the performance of find command.
# 1  
Increase the performance of find command.

I'm trying to exclude 'BACKUP', 'STORE', 'LOGGER' folders while searching for all files under a directory "/tmp/moht"

Once a file is found I wish to display the filename, the size of the file & the cksum value.

Below is the command, I'm using:

Code:
/opt/freeware/bin/find /tmp/moht -type d -name 'BACKUP' -prune -o -type d -name 'STORE' -prune -o -type d -name 'LOGGER' -prune -o -type f -exec cksum {} \;

Output:
Code:
  701567198 47034 /tmp/moht/UPLOAD_DATA_OLD/WINTER/CORE14_46000.txt
  1165791713 39019 /tmp/moht/UPLOAD_DATA_OLD/CORE14_530000.txt
  3448997243 35258 /tmp/moht/UPLOAD_DATA_OLD/CORE14_487300.txt
  .......
  .......
+ 4294967295 0 /tmp/moht/UPLOAD_DATA_OLD/TEST/CORE14_613500.txt
  2875732103 46516 /tmp/moht/NEW/CORE14_753200.txt
  1525766291 46064 /tmp/moht/UPLOAD_DATA_OLD/CORE14_849300.txt
  2315828286 46532 /tmp/moht/UPLOAD_DATA_OLD/CORE14_902400.txt

Although the performce i.e time taken by the above command is reasonable; I wish to understand if there is any scope of performce improvement.

One thing I guess may help somewhat is:

Code:
cd /tmp/moht; /opt/freeware/bin/find . -type d -name "BACKUP" -prune -o -type d -name "STORE" -prune -o -type f -exec cksum {} \;

I'm on AiX 6.1

Suggestions / recommendations are appreciated.

Last edited by Scrutinizer; 12-07-2019 at 09:33 AM.. Reason: quote tags -> code tags
# 2  
Compare performance of
Code:
cksum /tmp/moht/* | grep -v "BACKUP\|STORE\|LOGGER"

# 3  
Quote:
Originally Posted by RudiC
Compare performance of
Code:
cksum /tmp/moht/* | grep -v "BACKUP\|STORE\|LOGGER"

But you have not considered the file size. Can you please include that in your answer?

Also note that the files should be searched recursively under subfolders.
# 4  
This standard library call: nftw (or ftw)
IBM Knowledge Center

supports the find command traversing directory file trees - i.e., searching and locating files.

Assuming you want to keep the command you already have (and I am not sue that Rudi's suggested test is valid because of file and directory caching ):

A limiting factor is known to be the number of sub-directories in the file tree, and possibly the number of available open file descriptors - a per process limit.
If you can parallelize your code using several processes it may improve performance. I'm not sure this will help much because it depends on the number of sub-directories being large to gain any benefit. The developers who write system code try to maximize throughput.

What I'm saying is: performance enhancement work is subjective and often a misplaced resource and a waste of programmer time.
Suppose your command runs in one minute in production. Then you work hard and get it down to 35 seconds. The user perception of "slow" will still be there, so you have to get it down to maybe 6 seconds to make users happy and see it as "faster". In this case getting an order of magnitude improvement may not be possible.

And in this case you would have to do something about directory caching messing up testing because (you check this yourself) once you open a directory the system caches it for speedier access. Use the time command and rerun the command to see what I mean:
Code:
time [my long command goes here]
#write down the result
time [my long command goes here]
# write down the result and compare the two resulting times

This User Gave Thanks to jim mcnamara For This Post:
# 5  
The shorter pathnames is a small improvement only when post processing the output.
Then, you can bundle the names (shortens the command, not so much the run time).
But a + instead of the \; will have an impact. Then find runs cksum with many collected arguments - fewer runs are needed.
Code:
cd /tmp/moht && find . -type d \( -name 'BACKUP' -o -name 'STORE' -o -name 'LOGGER' \) -prune -o -type f -exec cksum {} +

Further, compare the speeds of the /usr/bin/find and the freeware find.
These 2 Users Gave Thanks to MadeInGermany For This Post:
# 6  
File sizes are included in my cksum. For climbing down the dir tree, try
Code:
cksum * */* */*/* |& grep -v "BACKUP\|STORE\|LOGGER\|cksum"
268795035 355 file1
113460914 19 file2

# 7  
Hi.
Quote:
Originally Posted by jim mcnamara
What I'm saying is: performance enhancement work is subjective and often a misplaced resource and a waste of programmer time.
Suppose your command runs in one minute in production. Then you work hard and get it down to 35 seconds. The user perception of "slow" will still be there, so you have to get it down to maybe 6 seconds to make users happy and see it as "faster". In this case getting an order of magnitude improvement may not be possible.
Indeed. The first question one needs to answer is Does it have to be faster? Otherwise you are spending time that probably could be better spent elsewhere.

That being said, I have been [trying to] learn rustc, and have compiled a few codes that are very fast. One is fd. You can see benchmarks comparing it to standard find at GitHub - sharkdp/fd: A simple, fast and user-friendly alternative to 'find'

Depending on choices fd is faster by a factor of 5 up to 9, or even faster if one ignores hidden directories.

However, it would require you to either download a compiled code, or download the Rust system and compile fd yourself. I don't see a version for AIX, so this is academic.

I suppose if enough folks asked for Rust to be ported to platforms like Solaris, AIX, etc., it might happen. It might be worth a try if one really, really wanted that extra bit of speed.

I'll take the speed if it's easy to do and I really need it, but otherwise I have other stuff to do.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #167
Difficulty: Easy
The Transmission Control Protocol (TCP) is one of the least used protocols of the Internet protocol suite.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Increase command length for ksh shell on Redhat Linux

I have a ksh shell script and i need to pass arguments which are generated by data pulled from a database. When the argument to the shell script is too long (about 4000 charecters) the below is the issue observed. I copy the command which is 4000 charecters long from the logs and paste it... (7 Replies)
Discussion started by: mohtashims
7 Replies

2. Solaris

8 character limit for ipcs command , any way to increase # of chars ?

Hello All, We have a working script which identifies and kills ipcs resources which havent been correctly killed during normal shutdowns. It is working fine and dandy however there are some issues now. Environment: SunOS 5.10 Generic_148888-03 sun4u sparc SUNW,SPARC-Enterprise ... (4 Replies)
Discussion started by: icalderus
4 Replies

3. Shell Programming and Scripting

Performance issue while using find command

Hi, I have created a shell script for Server Log Automation Process. I have used find xargs grep command to search the string. for Example, find -name | xargs grep "816995225" > test.txt . Here my problem is, We have lot of records and we want to grep the string... (4 Replies)
Discussion started by: nanthagopal
4 Replies

4. Shell Programming and Scripting

Awk : find progressive increase in numbers

NR_037575 -0.155613339079513 -0.952655362767482 -1.42096466949375 -0.797042023687969 -1.26535133041424 -0.468309306726272 NR_037576 0.59124585320226 0.408702582537126 0.888885242203586 -0.182543270665134 0.297639389001326 0.480182659666459... (4 Replies)
Discussion started by: quincyjones
4 Replies

5. Shell Programming and Scripting

SLEEP command performance

Hi, I wanted to run a particlar script for every 20 minutes. I dont have crontab in my server. Hence i ran this script in a loop by providing the command sleep 1200 Now i wanted to know is there any performance issue if this job keeps on execute in the server. Thanks, Puni (1 Reply)
Discussion started by: puni
1 Replies

6. Shell Programming and Scripting

Increase sed performance

I'm using sed to do find and replace. But since the file is huge and i have more than 1000 files to be searched, the script is taking a lot of time. Can somebody help me with a better sed command. Below is the details. Input: 1 1 2 3 3 4 5 5 Here I know the file is sorted. ... (4 Replies)
Discussion started by: gpaulose
4 Replies

7. Shell Programming and Scripting

Increase Performance

I have written a code using AWK & sed to compare two files. The structure of the files is like this" Format is this: <bit code> <file code> <string> Follwoed by any numbers of properties lines whic start with a "space" 10101010101111101 XX abcd a AS sasa BS kkk 1110000101010110 XX... (1 Reply)
Discussion started by: sandeep_hi
1 Replies

8. Solaris

What is the command to increase filesystem on solaris

I wanted to know what is the process or command to increase a filesystem on solaris. For example the /tmp directory. (3 Replies)
Discussion started by: strikelit
3 Replies

9. UNIX for Advanced & Expert Users

improve performance by using ls better than find

Hi , i'm searching for files over many Aix servers with rsh command using this request : find /dir1 -name '*.' -exec ls {} \; and then count them with "wc" but i would improve this search because it's too long and replace directly find with ls command but "ls *. " doesn't work. and... (3 Replies)
Discussion started by: Nicol
3 Replies

Featured Tech Videos