Shell script to search a pattern in a directory and output number of find counts


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell script to search a pattern in a directory and output number of find counts
# 1  
Old 08-07-2012
Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are
1) main directory where it has to search and
2) pattern to search within main directory all files (.c and .h files)
It has to print number of pattern found in main directory & each sub directory.
main dir --> Total pattern found = 5
|
sub dir --> 3
|
sub dir --> 2
# 2  
Old 08-08-2012
This is not a very elegant solution, but as a starting point try
Code:
find  /main_dir -type d -exec ./countit {} pattern \;

where countit is a shell script:
Code:
cnt=$(ls $1/$2 2>/dev/null|wc -w)  
if [ "$cnt" -gt 0 ]; then echo $1": "$cnt; fi

and main_dir and pattern need to be supplied by you.

---------- Post updated 08-08-12 at 09:49 AM ---------- Previous update was 07-08-12 at 11:23 AM ----------

Look at this thread to find a less clumsy solution than mine above. Still not too performant...

---------- Post updated at 09:57 AM ---------- Previous update was at 09:49 AM ----------

This is really performant, provided you have dirname on your system:
Code:
find main_dir -name pattern -exec dirname {} \;|uniq -c|awk '{print $2 " -> "$1}'

# 3  
Old 08-08-2012
Hello, RudiC:

Don't take any of what follows personally. It is intended solely as a helpful critique.

None of the solutions quoted below is very good.


Quote:
Originally Posted by RudiC
This is not a very elegant solution, but as a starting point try
Code:
find  /main_dir -type d -exec ./countit {} pattern \;

where countit is a shell script:
Code:
cnt=$(ls $1/$2 2>/dev/null|wc -w)  
if [ "$cnt" -gt 0 ]; then echo $1": "$cnt; fi

and main_dir and pattern need to be supplied by you.
Filenames with n occurrences of embedded whitespace will be counted n+1 times. wc -l would be a better choice.

If the pattern matches a directory name, that subdirectory's contents will be counted even if they do not match the pattern. ls -d will prevent this, but will not prevent the matching directory from being counted if the intent is to count only files.

If pattern were the script's first argument, the script would be compatible with the much more efficient -exec ... {} + syntax. The body of the script could then be put within a for-loop iterating over "$@".

Is there even any point in using ls for this? A for-loop which expands the pattern could easily sidestep all of these issues. Within the loop, test can avoid counting anything that isn't a regular file. Also, using the pattern to generate arguments for ls may face a stricter system length limit than the shell for-loop's list expansion.

In my opinion, it's not worth trying to fix this approach's bugs. Better to abandon it.



Quote:
Originally Posted by RudiC
Look at this thread to find a less clumsy solution than mine above. Still not too performant...
From that post ...

Quote:
Originally Posted by a20786
find . -type d -name somedirname -exec ksh -c 'echo -n $1" ";ls -ltr $1|wc -l' {} {} \;

This will serach for all directories from current direcotry and will give count of number (+1) of files/dirs present in that directory.
Launching an entire shell once per filename is not an efficient approach.

If the pathname has whitespace or begins with a dash, there will be problems.

Why does that code make ls work harder for no reason? It is generating the long format and forcing a reverse time sort when the only thing done with the output is a line count?



Quote:
Originally Posted by RudiC
This is really performant, provided you have dirname on your system:
Code:
find main_dir -name pattern -exec dirname {} \;|uniq -c|awk '{print $2 " -> "$1}'

This suggestion is nearly a very good one. Unfortunately, it won't yield the desired result.

find will very likely not generate all of a directory's contents in one contiguous chunk. It will begin outputting file names from dir A (for example), then descend into A/B, then A/B/C, then back up to A/B, and finally resume where it left off in A. Even if your find does not behave that way, it is allowed to do so. When this happens, the result is multiple, non-consecutive counts for the same directory.

The output of find needs to be sorted before uniq sees it. Also, I think that instead of executing dirname once per filename, it would be better to use one instance of sed to filter find output.

Code:
find main_dir -name pattern | sed 's#/[^/]*$##' | sort | uniq -c

For a massive amount of files, that sort could require a lot of memory. If necessary, one can trade memory for cpu by executing find once per directory (still much better than a full shell once per file):

Code:
find main_dir -type d -exec find {} -maxdepth 1 -name pattern \; | sed 's#/[^/]*$##' | uniq -c

If maxdepth is not available, recursion can still be avoided with a slightly cumbersome use of -prune.

Something to keep in mind: In some of the approaches the pattern is expanded by the shell and in others it's passed to find. The shell will not match a hidden file against a leading wildcard (?, *); find will.

Regards,
Alister
# 4  
Old 08-08-2012
Addition
Quote:
Originally Posted by alister
Hello, RudiC:

Don't take any of what follows personally. It is intended solely as a helpful critique.
...
Regards,
Alister
Absolutely not! You may have inferred from the various edits that I have taken a multi step approximation to the problem and its solution. I wasn't happy with the first ones either, running scripts or multiple shells when descending directory trees. I'll carefully analyse your proposal, as I'm "always learning" (agama's motto) and I really appreciate every single of your posts.

On the other hand, it is quite intimidating to know that my posts are being scrutinized that carefully!

Last edited by RudiC; 08-08-2012 at 03:22 PM.. Reason: Addition
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to find string based on pattern and search for its corresponding rows in column

Experts, Need your support for this awk script. we have only one input file, all these column 1 and column 2 are in same file and have to do lookup for values in one file(column1 and column2) but output we need in another file Need to grep row whose string contains 9K from column 1. When found... (6 Replies)
Discussion started by: as7951
6 Replies

2. Shell Programming and Scripting

A shell script to run a script which don't get terminated and send a pattern from the output by mail

Hi Guys, I am very new to shell script and I need your help here to write a script. Actually, I have a script abc.sh which don't get terminated itself. So I need to design a script to run this script, save the output to a file, search for a given string in the output and if it exists send those... (11 Replies)
Discussion started by: Sambit Sahu
11 Replies

3. UNIX for Beginners Questions & Answers

How to zip csv files having specific pattern in a directory using UNIX shell script?

I have files in a Linux directory . Some of the file is listed below -rw-rw-r--. 1 roots roots 0 Dec 23 02:17 zzz_123_00000_A_1.csv -rw-rw-r--. 1 roots roots 0 Dec 23 02:18 zzz_121_00000_A_2.csv -rw-rw-r--. 1 roots roots 0 Dec 23 02:18 zzz_124_00000_A_3.csv drwxrwxr-x. 2 roots roots 6 Dec 23... (4 Replies)
Discussion started by: Balraj
4 Replies

4. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

UNIX Shell Script Help for pattern search

Hi Need help for below coding scenario. I have a file with say 4 lines as below. DEFINE JOB TPT_LOAD_INTO_EMP_DET ( TDPID = @TPT_TDSERVER , USERNAME = @TPT_TDUSER ) ; ( 'DROP TABLE '||@TPT_WRKDB ||'.LOG_'||@TPT_TGT ||' ; ') , SELECT * FROM OPERATOR (FILE_READER) ; ) ; Now I want to... (5 Replies)
Discussion started by: Santanu2015
5 Replies

6. Shell Programming and Scripting

Shell script to find if any new entry in directory

I require a shell script to find if any new entry of dump files present in a particular directory and to send an email if any new entry exists.I had a crontab to run the script for every 5 min. Below are the file names.dump.20150327.152407.12058630.0002.phd.gz... (9 Replies)
Discussion started by: bhas85
9 Replies

7. UNIX for Dummies Questions & Answers

find Search - Find files not matching a pattern

Hello all, this is my first and probably not my last question around here. I do hope you can help or at least point me in the right direction. My question is as follows, I need to find files and possible folders which are not owner = AAA group = BBB with a said location and all sub folders ... (7 Replies)
Discussion started by: kilobyter
7 Replies

8. Shell Programming and Scripting

Help me to find files in a shell script with any matching pattern

Hi friends.. I have many dirs in my working directory. Every dir have thousands of files (.jsp, .java, .xml..., etc). So I am working with an script to find every file recursively within those directories and subdirectories ending with .jsp or .java which contains inside of it, the the pattern... (3 Replies)
Discussion started by: hnux
3 Replies

9. Shell Programming and Scripting

shell script to search content of file with timestamps in the directory

hello, i want to make a script to search the file contents in my home directory by a given date and output me the line that has the date... (10 Replies)
Discussion started by: psychobeauty
10 Replies

10. Shell Programming and Scripting

Shell script to search through numbers and print the output

Suppose u have a file like 1 30 ABCSAAHSNJQJALBALMKAANKAMLAMALK 4562676268836826826868268468368282972982 2863923792102370179372012792701739729291 31 60... (8 Replies)
Discussion started by: cdfd123
8 Replies
Login or Register to Ask a Question