Increase the performance of find command.

12-07-2019

Registered User

1,014, 10

Join Date: Jun 2011

Last Activity: 21 October 2020, 5:28 AM EDT

Posts: 1,014

Thanks Given: 258

Thanked 10 Times in 10 Posts

Increase the performance of find command.

I'm trying to exclude 'BACKUP', 'STORE', 'LOGGER' folders while searching for all files under a directory "/tmp/moht"

Once a file is found I wish to display the filename, the size of the file & the cksum value.

Below is the command, I'm using:

Code:

/opt/freeware/bin/find /tmp/moht -type d -name 'BACKUP' -prune -o -type d -name 'STORE' -prune -o -type d -name 'LOGGER' -prune -o -type f -exec cksum {} \;

Output:

Code:

  701567198 47034 /tmp/moht/UPLOAD_DATA_OLD/WINTER/CORE14_46000.txt
  1165791713 39019 /tmp/moht/UPLOAD_DATA_OLD/CORE14_530000.txt
  3448997243 35258 /tmp/moht/UPLOAD_DATA_OLD/CORE14_487300.txt
  .......
  .......
+ 4294967295 0 /tmp/moht/UPLOAD_DATA_OLD/TEST/CORE14_613500.txt
  2875732103 46516 /tmp/moht/NEW/CORE14_753200.txt
  1525766291 46064 /tmp/moht/UPLOAD_DATA_OLD/CORE14_849300.txt
  2315828286 46532 /tmp/moht/UPLOAD_DATA_OLD/CORE14_902400.txt

Although the performce i.e time taken by the above command is reasonable; I wish to understand if there is any scope of performce improvement.

One thing I guess may help somewhat is:

Code:

cd /tmp/moht; /opt/freeware/bin/find . -type d -name "BACKUP" -prune -o -type d -name "STORE" -prune -o -type f -exec cksum {} \;

I'm on AiX 6.1

Suggestions / recommendations are appreciated.

Last edited by Scrutinizer; 12-07-2019 at 08:33 AM.. Reason: quote tags -> code tags

mohtashims

View Public Profile for mohtashims

Find all posts by mohtashims

12-07-2019

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Compare performance of

Code:

cksum /tmp/moht/* | grep -v "BACKUP\|STORE\|LOGGER"

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-07-2019

Registered User

1,014, 10

Join Date: Jun 2011

Last Activity: 21 October 2020, 5:28 AM EDT

Posts: 1,014

Thanks Given: 258

Thanked 10 Times in 10 Posts

Quote:

Originally Posted by RudiC

Compare performance of

Code:

cksum /tmp/moht/* | grep -v "BACKUP\|STORE\|LOGGER"

But you have not considered the file size. Can you please include that in your answer?

Also note that the files should be searched recursively under subfolders.

mohtashims

View Public Profile for mohtashims

Find all posts by mohtashims

12-07-2019

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

This standard library call: nftw (or ftw)
IBM Knowledge Center

supports the find command traversing directory file trees - i.e., searching and locating files.

Assuming you want to keep the command you already have (and I am not sue that Rudi's suggested test is valid because of file and directory caching ):

A limiting factor is known to be the number of sub-directories in the file tree, and possibly the number of available open file descriptors - a per process limit.
If you can parallelize your code using several processes it may improve performance. I'm not sure this will help much because it depends on the number of sub-directories being large to gain any benefit. The developers who write system code try to maximize throughput.

What I'm saying is: performance enhancement work is subjective and often a misplaced resource and a waste of programmer time.
Suppose your command runs in one minute in production. Then you work hard and get it down to 35 seconds. The user perception of "slow" will still be there, so you have to get it down to maybe 6 seconds to make users happy and see it as "faster". In this case getting an order of magnitude improvement may not be possible.

And in this case you would have to do something about directory caching messing up testing because (you check this yourself) once you open a directory the system caches it for speedier access. Use the time command and rerun the command to see what I mean:

Code:

time [my long command goes here]
#write down the result
time [my long command goes here]
# write down the result and compare the two resulting times

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

12-07-2019

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

The shorter pathnames is a small improvement only when post processing the output.
Then, you can bundle the names (shortens the command, not so much the run time).
But a + instead of the \; will have an impact. Then find runs cksum with many collected arguments - fewer runs are needed.

Code:

cd /tmp/moht && find . -type d \( -name 'BACKUP' -o -name 'STORE' -o -name 'LOGGER' \) -prune -o -type f -exec cksum {} +

Further, compare the speeds of the /usr/bin/find and the freeware find.

These 2 Users Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

12-07-2019

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

File sizes are included in my cksum. For climbing down the dir tree, try

Code:

cksum * */* */*/* |& grep -v "BACKUP\|STORE\|LOGGER\|cksum"
268795035 355 file1
113460914 19 file2

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-07-2019

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Quote:

Originally Posted by jim mcnamara

What I'm saying is: performance enhancement work is subjective and often a misplaced resource and a waste of programmer time.
Suppose your command runs in one minute in production. Then you work hard and get it down to 35 seconds. The user perception of "slow" will still be there, so you have to get it down to maybe 6 seconds to make users happy and see it as "faster". In this case getting an order of magnitude improvement may not be possible.

Indeed. The first question one needs to answer is Does it have to be faster? Otherwise you are spending time that probably could be better spent elsewhere.

That being said, I have been [trying to] learn rustc, and have compiled a few codes that are very fast. One is fd. You can see benchmarks comparing it to standard find at GitHub - sharkdp/fd: A simple, fast and user-friendly alternative to 'find'

Depending on choices fd is faster by a factor of 5 up to 9, or even faster if one ignores hidden directories.

However, it would require you to either download a compiled code, or download the Rust system and compile fd yourself. I don't see a version for AIX, so this is academic.

I suppose if enough folks asked for Rust to be ported to platforms like Solaris, AIX, etc., it might happen. It might be worth a try if one really, really wanted that extra bit of speed.

I'll take the speed if it's easy to do and I really need it, but otherwise I have other stuff to do.

Best wishes ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

Shell Programming and Scripting

Increase the performance of find command.

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Increase command length for ksh shell on Redhat Linux

Discussion started by: mohtashims

2. Solaris

8 character limit for ipcs command , any way to increase # of chars ?

Discussion started by: icalderus

3. Shell Programming and Scripting

Performance issue while using find command

Discussion started by: nanthagopal

4. Shell Programming and Scripting

Awk : find progressive increase in numbers

Discussion started by: quincyjones

5. Shell Programming and Scripting

SLEEP command performance

Discussion started by: puni

6. Shell Programming and Scripting

Increase sed performance

Discussion started by: gpaulose

7. Shell Programming and Scripting

Increase Performance

Discussion started by: sandeep_hi

8. Solaris

What is the command to increase filesystem on solaris

Discussion started by: strikelit

9. UNIX for Advanced & Expert Users

improve performance by using ls better than find

Discussion started by: Nicol