Recursive directory search using ls instead of find


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Recursive directory search using ls instead of find
# 1  
Old 07-06-2011
Recursive directory search using ls instead of find

I was working on a shell script and found that the find command took too long, especially when I had to execute it multiple times. After some thought and research I came up with two functions.
fileScan()
filescan will cd into a directory and perform any operations you would like from within it.
directoryScan()
directoryScan will recursively cd into all directories benieth an initial provided root directory. once in a new directory, the directory is sent to fileScan so that other functions can be executed.

I found that this is blazing fast compared to find especially when searching large directory trees or if having to run more than one find in a script or chron.

enjoy the code Smilie
Code:
#!/bin/bash
# Directory Scanner using recursive ls instead of find
# Do not make any of the local variables into globals
# folder, numdirectories, and x should not be used outside fileScan() and directoryScan()
# directoryScan() will cd into all directories below the "root" directory sent to it
# fileScan() will perform operations on any directory sent to it
fileScan()
{
local folder=$1
cd $folder
if [ $folder = $PWD ]
then
#you are now inside of a directory.  Do any operations you need to do with files that may exist in this directory
fi
}
directoryScan()
{
local folder=$1
cd $folder
if [ $folder = $PWD ]
then
local numdirectories=$(ls -lS | egrep '^d' | wc -l)
fileScan $folder
local x=1
while [ $x -le $numdirectories ]
do
subdirectory=$(ls -lS | egrep '^d' | sed "s/[ \t][ \t]*/ /g" | cut -d" " -f9 | head -n $x | tail -n 1)
subdirectory="${folder}/${subdirectory}"
directoryScan $subdirectory
x=$(($x + 1))
cd $folder
done
fi
}
# sample call to directoryScan()
# directoryScan $rootdirectory
# sample call to fileScan()
# fileScan $scandirectory

# 2  
Old 07-06-2011
Hi, newreverie:

Welcome to the forum.

I'd be interested in seeing to what that shell script is comparitively blazingly fast. I'm inclined to believe that your find solution was suboptimal if that shell script, executing those pipelines for each visited directory, is faster.

If you are not familiar with AWK, you might enjoy the challenge of learning enough of it to simplify the egrep|sed|cut|head|tail pipeline to one concise AWK invocation.

Performance and efficiency aside, there are some potentially serious issues with that code. One that stands out: if a directory is deleted between the time $numdirectories is calculated and the subsequent while loop concludes, entire subtrees of the hierarchy will be visited more than once (a result of the input to head being shorter than expected). Depending on what's being done with each of the files, this could be deal breaker.

Again, welcome to the forum and thanks for the contribution.

Regards,
Alister
# 3  
Old 07-06-2011
I am also quite sceptical.

By the way, ls perform an ascii sorting by default if you are in a directory with several tousands of files, this sorting operation can be consumming and may slow down the processing.

To avoid it you can use the -f option to get the inode in the order they will comme from the directory structure, this will avoid useless sorting, especially when you pipe your ls output in a wc -l

A lot of the performance problem are because of weak algorithm logic or approach, going through a redesigning step logic can then speed up processing.

I am curious to see the code of the initial "poor performance" script that was using the find command.

Sharing your code is still a nice intention.

Here are some example of performance problem because of wrong logic or bad use of find command :

https://www.unix.com/shell-programmin...nce-issue.html

https://www.unix.com/unix-dummies-que...-txt-file.html

Last edited by ctsgnb; 07-06-2011 at 03:08 PM..
# 4  
Old 07-07-2011
find may be faster if I sent the results into an array or text file and then looped through those results for my program.

My issue with the find command had more to do with the time it took to run to completion. Given the large directory structure and the variety and type of files i needed to search for, the find command took several minutes or more to run to completion.

The particular shell i was writing has a UI, and so the user is forced to wait several minutes or more between executing any search and the ability to work with the results of that search. This was decided to be unacceptable and so a method was needed to execute searches closer to real time and allow the user to interact with files as they are found.

find could still be used the fileScan() function with the prune option to search only within the current directory. But I left the options open in that function to suite your purposes.

So perhaps I overstated the net speed of the functions in relation to find. find may work faster overal, but if a user is faced with waiting for a find command to run to completion vs the abiltiy to interact with the results of a search in near real time, i believe this is a better method.

As for the comment about directory deletion while this script is running, I can see the pitfalls, but it can also be avoided by making subdirectories into a local array and storing the results of an ls there without using the head and tail method. attempts to cd into the non existent directory would be handled in the if [ $folder = $PWD ] logic.
# 5  
Old 07-08-2011
Use a program rather than a shell script

One of the phenomena I have noticed over my years of being involved in Unix/Linux is that people tend to over-use shell scripting.

The problem with shell scripting is, simply, performance. Its one thing to accomplish small to medium tasks with a shell script. But once you begin doing serious processing work, involving tight loops of text processing, you will very quickly run into trouble. The reason is because most things done in a shell script are done by small programs - cut, sed, awk, head, tail, etc. When you combine dozens of these in a loop that will be running heavily - the computer has to launch THOUSANDS of tiny programs to accomplish the overall task.

I have seen large powerful Unix systems brought to their knees by simple DB loader scripts done in ksh for this very reason.

The solution is to use a more appropriate software tool to solve the problem. If you really want to do it in shell scripting style, why not try it in perl or python? These programs will allow you to write a single program to accomplish the task, no spawning of child programs required. This will vastly speed up the program. I know this to be a fact, because I've had a simple perl file and text search program in my toolbox since 1994.
# 6  
Old 07-08-2011
Quote:
Originally Posted by bearvarine
One of the phenomena I have noticed over my years of being involved in Unix/Linux is that people tend to over-use shell scripting.

The problem with shell scripting is, simply, performance.
One of the phenomena I have noticed over my years of being involved with UNIX/Linux is that people tend to blame poor shell scripts on the language.

The program above isn't slow because it's shell. It's slow because of things like these:
Code:
ls -lS | egrep '^d' | sed "s/[ \t][ \t]*/ /g" | cut -d" " -f9 | head -n $x | tail -n 1

Six programs and five pipes, to do something you could've done in two or less! How find could be slower I can't imagine -- perhaps he didn't realize find is recursive?
Quote:
Its one thing to accomplish small to medium tasks with a shell script. But once you begin doing serious processing work, involving tight loops of text processing, you will very quickly run into trouble. The reason is because most things done in a shell script are done by small programs - cut, sed, awk, head, tail, etc. When you combine dozens of these in a loop that will be running heavily - the computer has to launch THOUSANDS of tiny programs to accomplish the overall task.
If you'd been programming shell for years, you ought to know:

1) awk can make a decent a replacement for all the tools you listed above -- in combinations, even -- being capable of quite complex programs in its own right. Putting it in the same class as head, tail, etc is a bit of a misnomer and jamming it in the middle of a long pipe chain is generally misuse: awk can often replace the entire chain, sometimes the entire script.

2) It's often not necessary to run thousands of tiny processes to accomplish single tasks when people have chosen to do so. Efficient use of piping or external programs are powerful features, but too often they're abused, causing terrible performance.

Quote:
I have seen large powerful Unix systems brought to their knees by simple DB loader scripts done in ksh for this very reason.
Funny thing -- I've done that with Perl. I've also done it in assembly language. It's possible to write terrible code in any language. Smilie
Quote:
The solution is to use a more appropriate software tool to solve the problem. If you really want to do it in shell scripting style, why not try it in perl or python?
because those don't resemble shell languages? Someone who writes a shell script precisely the same way they'd write a perl or python one isn't utilizing the shell's important features.
Quote:
I know this to be a fact, because I've had a simple perl file and text search program in my toolbox since 1994.
Did you know many modern shells have regular expressions, can do substrings and simple text replacement, can pipe text between entire code blocks, can read line by line or token by token and split lines on tokens, can open/close/seek in files, etc, etc, etc -- all as shell builtin features?

All too often, people don't, and use thousands of tiny external programs instead.

The trick is to do large amounts of work with each process you make, never use them for anything trivial.

Last edited by Corona688; 07-08-2011 at 12:21 PM..
# 7  
Old 07-08-2011
@Corona688: You make very good points here, and I believe you are essentially affirming my main point - don't put lots of little programs together in tight loops and expect good performance from a shell script.

Honestly though, -- and I know this is just my personal opinion -- I think awk is a scourge upon our land. Like kudzu, it should be ripped out where ever it is found and replaced with something less inscrutable. I don't think there is any reason in 2011 to continue using an such an ancient, arcane, difficult to debug tool like awk when there are so many better choices available. - I'm Just Sayin'... Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Recursive folder search faster than find?

I'm trying to find folders created by a propritary data aquisition software with the .aps ending--yes, I have never encountered folder with a suffix before (some files also end in .aps) and sort them by date. I need the whole path ls -dt "$dataDir"*".aps"does exactly what I want except for the... (2 Replies)
Discussion started by: Michael Stora
2 Replies

2. UNIX for Dummies Questions & Answers

How to search in specific directory using find?

Hi, Is there any way to use find command and search only specific subdirectories in a directory. for example /home/d1 /home/d2 /home/d3 i want to search in the following directories /home /home/d1 /home/d2 i do not want the find command to search the /home/d3 directory. (6 Replies)
Discussion started by: Little
6 Replies

3. UNIX for Dummies Questions & Answers

Help needed - find command for recursive search

Hi All I have a requirement to find the file that are most latest to be modified in each directory. Can somebody help with the command please? E.g of the problem. The directory A is having sub directory which are having subdirectory an so on. I need a command which will find the... (2 Replies)
Discussion started by: sudeep.id
2 Replies

4. Shell Programming and Scripting

How to restrict Find only search the current directory?

hello, all I have googled internet, read the man page of Find, searched this forum, but still could not figure out how. My current directory is: little@wenwen:~$ pwd /home/little little@wenwen:~$ I want to use find command to list the files in my current directory, how should i write... (3 Replies)
Discussion started by: littlewenwen
3 Replies

5. Shell Programming and Scripting

Find command to search files in a directory excluding subdirectories

Hi Forum, I am using the below command to find files older than x days in a directory excluding subdirectories. From the previous forums I got to know that prune command helps us not to descend in subdirectories. Though I am using it here, not getting the desired result. cd $dir... (8 Replies)
Discussion started by: jhilmil
8 Replies

6. UNIX for Dummies Questions & Answers

Restricting a Find search to the current directory only

Hi All, I am trying to delete file (with a mtime older than 2 days) from the current directory ONLY using: find . -daystart -maxdepth 1 -mtime 2 -exec rm {} \; but this doesn't seem to work it is still find files in subdirectories which I don't want to delete. Please can anyone offer... (2 Replies)
Discussion started by: daveu7
2 Replies

7. Shell Programming and Scripting

search directory-find files-append at end of line

Hi, I have a command "get_data" with some parameters in few *.text files of a directory. I want to first find those files that contain this command and then append the following parameter to the end of the command. example of an entry in the file :- get_data -x -m50 /etc/web/getid this... (1 Reply)
Discussion started by: PrasannaKS
1 Replies

8. Shell Programming and Scripting

non recursive search in the current directory only

Hi, Am trying for a script which should delete more than 15 days older files in my current directory.Am using the below piece of code: "find /tmp -type f -name "pattern" -mtime +15 -exec /usr/bin/ls -altr {} \;" "find /tmp -type f -name "pattern" -mtime +15 -exec /usr/bin/rm -f {} \;" ... (9 Replies)
Discussion started by: puppala
9 Replies

9. UNIX for Dummies Questions & Answers

Unix find command to print directory and search string

Hi i need to print pathname in which the string present using 'find' command sample output like this Pathname String to be searched ---------- -------------------- /usr/test/myfile get /opt/test/somefile get Thanks in... (4 Replies)
Discussion started by: princein
4 Replies

10. UNIX for Advanced & Expert Users

find file with date and recursive search for a text

Hey Guyz I have a requirement something like this.. a part of file name, date of modification of that file and a text is entered as input. like Date : 080206 (MMDDYY format.) filename : hotel_rates text : Jim now the file hotel_rates.ZZZ.123 (creation date is Aug 02 2006) should be... (10 Replies)
Discussion started by: rosh0623
10 Replies
Login or Register to Ask a Question