Recursive find / grep within a file / count of a string

12-02-2012

Registered User

3, 0

Join Date: Dec 2012

Last Activity: 4 December 2012, 7:15 AM EST

Posts: 3

Thanks Given: 1

Thanked 0 Times in 0 Posts

Recursive find / grep within a file / count of a string

Hi All,

This is the first time I have posted to this forum so please bear with me. Thanks also advance for any help or guidance.

For a project I need to do the following.

1. There are multiple files in multiple locations so I need to find them and the location. So I had planned to use
cd LOCATION;
find . -name "FILENAME.TXT" -type f -print > $HOME/list_of_locations.txt

this gives my paths in this format ./dir1/dir2/dir3/FILENAME.txt

2. Each one of these files is of a different format and the only way to work out the different format is to count the number of occurances of the "|" string in each file.

I can either use head -l to take first row and count the number of occurences of the "|" character or else grep the "|" in all rows and divide by the wc -l (number of lines). My preference is on the most efficient.

3. I want to produce a new file listing the full path and the number of occurrences of the "|" character so then I can process the .txt file later. If the number of occurences can somehow be concatenated onto the list_of_locations.txt in 1 or else a new file created with this information.

So what I am asking:

Is there a quick way of doing this?
Using find . -name is very slow - but looks like there is no other way as I am doing a recursive search across subdirectories.
Is there a better way to interogate my .txt file to find out how many "|" characters there are?
Is there a better way to put all of this into a UNIX script?

Thanks in advance for any help you can give either code snippit or advice.

Regards,
Charlie.

Charlie6742

View Public Profile for Charlie6742

Find all posts by Charlie6742

12-02-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You can do all of that in one line:

Code:

find /pathA /pathB ... /pathN -name "filename" -print0 |xargs -0 awk -F\| 'FNR==1 {print FILENAME, NF}'

This will look into the list of locations for the filename(s) you specified and print them out, separated by a "0"- char. xargs will collect them all and run awk on this list. awk will open each file, and print full path and field count from the first line. Redirect as desired.
As I am not aware of how to skip the remainder of the file and go on to the next one, there is some optimization potential. Trials with close("-") right after the print statement showed a little improvement in execution time, but I'm not sure if it does the right thing. EDIT: It does not; returns -1 error code.
Anybody out there knowing about skipping to the next file in awk's argument list?

Last edited by RudiC; 12-02-2012 at 08:39 AM.. Reason: Tried closing stdin to skip remainder / revoked close ("-")

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-02-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

RudiC's suggestion is close, but misses on a couple of points. Since no pathname operands are given to awk, all of the filenames printed by awk will be an empty string. And, if there are x field separators on a line, there are x+1 fields.

The -print0 find primary and the -0 option to xargs are not defined by the standards, so they might not be available on your implementation.

A portable way to do what I believe was requested is:

Code:

find . -name 'FILENAME.TXT' -exec awk -F'|' 'FNR==1{printf("%s %d\n", FILENAME, NF-1)}' {} +

Some implementations of awk have a nextfile statement (like next, but while next restarts processing on the next line, nextfile restarts processing on the first line of the next file). If your awk has this non-standard extension, the following will be much more efficient for long input files:

Code:

find . -name 'FILENAME.TXT' -exec awk -F'|' '{printf("%s %d\n", FILENAME, NF-1);nextfile}' {} +

-------------------------------
Note that the comment I made about Rudi's proposal not printing pathnames is totally bogus. The xargs utility will add the pathname operand to awk as it invokes awk.

Last edited by Don Cragun; 12-02-2012 at 08:42 AM..

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-02-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Thank you, Don, for commenting on my proposal.

Quote:

Originally Posted by Don Cragun

. . . Since no pathname operands are given to awk, all of the filenames printed by awk will be an empty string.

At least with the combination of find and awk implemented on my linux system, there's a full path listing avalable, including filenames containing spaces:

Code:

find /var/log -iname \*.log -print0 |xargs -0 awk  -F\| 'FNR==1 {print FILENAME, NF}'
/var/log/auth.log 1
/var/log/dist-upgrade/history.log 0
. . .
/var/log/x y.log 3
/var/log/kern.log 1

Quote:

And, if there are x field separators on a line, there are x+1 fields.

Yes. Still I thought the number of fields to be more relevant than the number of separators. Might have been premature.

Quote:

Code:

find . -name 'FILENAME.TXT' -exec awk -F'|' 'FNR==1{printf("%s %d\n", FILENAME, NF-1)}' {} +

Works, and satisfies the standards, but:

Code:

time find . . . -print0 |xargs -0 awk  -F\| '. . .'
real    0m0.034s
time find . . . -exec awk -F\| '. . .' {} \;
real    0m0.208s

Quote:

Some implementations of awk have a nextfile statement

Special thanks for this; I was looking for that or an equivalent; unfortunately not available on my system.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-02-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by RudiC

Thank you, Don, for commenting on my proposal.

At least with the combination of find and awk implemented on my linux system, there's a full path listing avalable, including filenames containing spaces:

Code:

find /var/log -iname \*.log -print0 |xargs -0 awk  -F\| 'FNR==1 {print FILENAME, NF}'
/var/log/auth.log 1
/var/log/dist-upgrade/history.log 0
. . .
/var/log/x y.log 3
/var/log/kern.log 1

Hi Rudi,
Yes, but note that by skipping the -print (or -print0) and the invocation of xargs, awk is still given the full pathname as an operand (even if there are spaces, tabs, or newlines included in the pathname).

Quote:

Yes. Still I thought the number of fields to be more relevant than the number of separators. Might have been premature.

Agreed. But it wasn't what Charlie6742 asked for.

Quote:

Works, and satisfies the standards, but:

Code:

time find . . . -print0 |xargs -0 awk  -F\| '. . .'
real    0m0.034s
time find . . . -exec awk -F\| '. . .' {} \;
real    0m0.208s

Not surprising since what you timed runs awk once for each input file.
But note that I specified:

Code:

find . . . -exec awk -F\| '. . .' {} +

not:

Code:

find . . . -exec awk -F\| '. . .' {} \;

With the + instead of the \; find shouldn't execute awk any more times than xargs would and we avoid needing to start xargs at all.

Quote:

Special thanks for this; I was looking for that or an equivalent; unfortunately not available on my system.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-02-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by Don Cragun

. . .
But note that I specified:

Code:

find . . . -exec awk -F\| '. . .' {} +

Rats ... missed that. Absolutely right, plays in the same league:

Code:

time find . . .  -exec awk -F\| ' . . . ' {} +
. . .
real    0m0.034s

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-02-2012

Registered User

3, 0

Join Date: Dec 2012

Last Activity: 4 December 2012, 7:15 AM EST

Posts: 3

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks guys. I have played with all the methods you suggested but it does not seem to give me any output. It works without errors - but just doesn't give output. I should have said I am using the bash shell - could some of these commands not be working properly on my setup? Is there a way I can set it up so it works as you have it.

If it helps - this is the message it gives me for one of the options that doesn't work.

Code:

find . -name "a.txt" -exec /usr/bin/awk -F'|' '{printf("%s %d\n", FILENAME, NF-1);nextfile}' {} +
./dir1/a.txt 40
awk: illegal statement 603430
record number 1

Once again thanks in advance for looking at this and so quickly - its really appreciated.

Charlie

Last edited by Scott; 12-04-2012 at 09:00 AM.. Reason: Added code tags; removed formatting

Charlie6742

View Public Profile for Charlie6742

Find all posts by Charlie6742

Shell Programming and Scripting

Recursive find / grep within a file / count of a string

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep a string and count following lines starting with another string

Discussion started by: Syeda Sumayya

2. Shell Programming and Scripting

Help with Passing the Output of grep to sed command - to find and replace a string in a file.

Discussion started by: SriRamKrish

3. Shell Programming and Scripting

Recursive search for string in file with Loop condition

Discussion started by: rkrish123

4. UNIX for Dummies Questions & Answers

Recursive Find on file size

Discussion started by: jimbojames

5. Shell Programming and Scripting

Tricky recursive removal (find with grep)

Discussion started by: jvsrvcs

6. Shell Programming and Scripting

How to find the latest file on Unix or Linux (recursive)

Discussion started by: 1or2is3

7. Shell Programming and Scripting

Grep string from logs of last 1 hour on files of 2 different servers and calculate count

Discussion started by: poweroflinux

8. UNIX for Dummies Questions & Answers

Grep and count the string in a file.

Discussion started by: vikram2008

9. UNIX for Dummies Questions & Answers

to grep and find the count

Discussion started by: arunkumar_mca

10. UNIX for Advanced & Expert Users

find file with date and recursive search for a text

Discussion started by: rosh0623