Recursive find / grep within a file / count of a string
Hi All,
This is the first time I have posted to this forum so please bear with me. Thanks also advance for any help or guidance.
For a project I need to do the following.
1. There are multiple files in multiple locations so I need to find them and the location. So I had planned to use
cd LOCATION;
find . -name "FILENAME.TXT" -type f -print > $HOME/list_of_locations.txt
this gives my paths in this format ./dir1/dir2/dir3/FILENAME.txt
2. Each one of these files is of a different format and the only way to work out the different format is to count the number of occurances of the "|" string in each file.
I can either use head -l to take first row and count the number of occurences of the "|" character or else grep the "|" in all rows and divide by the wc -l (number of lines). My preference is on the most efficient.
3. I want to produce a new file listing the full path and the number of occurrences of the "|" character so then I can process the .txt file later. If the number of occurences can somehow be concatenated onto the list_of_locations.txt in 1 or else a new file created with this information.
So what I am asking:
Is there a quick way of doing this?
Using find . -name is very slow - but looks like there is no other way as I am doing a recursive search across subdirectories.
Is there a better way to interogate my .txt file to find out how many "|" characters there are?
Is there a better way to put all of this into a UNIX script?
Thanks in advance for any help you can give either code snippit or advice.
You can do all of that in one line:
This will look into the list of locations for the filename(s) you specified and print them out, separated by a "0"- char. xargs will collect them all and run awk on this list. awk will open each file, and print full path and field count from the first line. Redirect as desired.
As I am not aware of how to skip the remainder of the file and go on to the next one, there is some optimization potential. Trials with close("-") right after the print statement showed a little improvement in execution time, but I'm not sure if it does the right thing. EDIT: It does not; returns -1 error code.
Anybody out there knowing about skipping to the next file in awk's argument list?
Last edited by RudiC; 12-02-2012 at 08:39 AM..
Reason: Tried closing stdin to skip remainder / revoked close ("-")
RudiC's suggestion is close, but misses on a couple of points. Since no pathname operands are given to awk, all of the filenames printed by awk will be an empty string. And, if there are x field separators on a line, there are x+1 fields.
The -print0 find primary and the -0 option to xargs are not defined by the standards, so they might not be available on your implementation.
A portable way to do what I believe was requested is:
Some implementations of awk have a nextfile statement (like next, but while next restarts processing on the next line, nextfile restarts processing on the first line of the next file). If your awk has this non-standard extension, the following will be much more efficient for long input files:
-------------------------------
Note that the comment I made about Rudi's proposal not printing pathnames is totally bogus. The xargs utility will add the pathname operand to awk as it invokes awk.
Last edited by Don Cragun; 12-02-2012 at 08:42 AM..
. . . Since no pathname operands are given to awk, all of the filenames printed by awk will be an empty string.
At least with the combination of find and awk implemented on my linux system, there's a full path listing avalable, including filenames containing spaces:
Quote:
And, if there are x field separators on a line, there are x+1 fields.
Yes. Still I thought the number of fields to be more relevant than the number of separators. Might have been premature.
Quote:
Works, and satisfies the standards, but:
Quote:
Some implementations of awk have a nextfile statement
Special thanks for this; I was looking for that or an equivalent; unfortunately not available on my system.
At least with the combination of find and awk implemented on my linux system, there's a full path listing avalable, including filenames containing spaces:
Hi Rudi,
Yes, but note that by skipping the -print (or -print0) and the invocation of xargs, awk is still given the full pathname as an operand (even if there are spaces, tabs, or newlines included in the pathname).
Quote:
Yes. Still I thought the number of fields to be more relevant than the number of separators. Might have been premature.
Agreed. But it wasn't what Charlie6742 asked for.
Quote:
Works, and satisfies the standards, but:
Not surprising since what you timed runs awk once for each input file.
But note that I specified:
not:
With the + instead of the \; find shouldn't execute awk any more times than xargs would and we avoid needing to start xargs at all.
Quote:
Special thanks for this; I was looking for that or an equivalent; unfortunately not available on my system.
Thanks guys. I have played with all the methods you suggested but it does not seem to give me any output. It works without errors - but just doesn't give output. I should have said I am using the bash shell - could some of these commands not be working properly on my setup? Is there a way I can set it up so it works as you have it.
If it helps - this is the message it gives me for one of the options that doesn't work.
Once again thanks in advance for looking at this and so quickly - its really appreciated.
Charlie
Last edited by Scott; 12-04-2012 at 09:00 AM..
Reason: Added code tags; removed formatting
I have a large dataset with following structure;
C 0001 Carbon
D SAR001 methane
D SAR002 ethane
D SAR003 propane
D SAR004 butane
D SAR005 pentane
C 0002 Hydrogen
C 0003 Nitrogen
C 0004 Oxygen
D SAR011 ozone
D SAR012 super oxide
C 0005 Sulphur
D SAR013... (3 Replies)
I have a file example.txt as follows :SomeTextGoesHere
$$TODAY_DT=20140818
$$TODAY_DT=20140818
$$TODAY_DT=20140818I need to automatically update the date (20140818) in the above file, by getting the new date as argument, using a shell script.
(It would even be better if I could pass... (5 Replies)
Hi,
Need some help...
I want to execute sequence commands, like below
test1.sh
test2.sh
...etc
test1.sh file will generate log file, we need to search for 'complete' string on test1.sh file, once that condition success and then it should go to test2.sh file, each .sh scripts will take... (5 Replies)
Is there a way to use the find command to recursively scan directories for files greater than 1Gb in size and print out the directory path and file name only?
Thanks in advance. (6 Replies)
Tricky one:
I want to do several things all at once to blow away a directory (rm -rf <dir>)
1) I want to find all files recursively that have a specific file extension (.ver) for example.
2) Then in that file, I want to grep for an expression ( "sp2" ) for example.
3) Then I want to... (1 Reply)
Hi all,
I need to get the latest file. I have found this command "ls -lrt" that is great but not recursive.
Can anyone help?
Thanx by advance. (7 Replies)
Hi,
I am trying to grep a particular string from the files of 2 different servers without copying and calculate the total count of its occurence on both files.
File structure is same on both servers and for reference as follows:
27-Aug-2010... (4 Replies)
Hi,
I have to grep a word 'XYZ' from 900 files ( from 2007 till date), take its count month wise. The output should display month, count , word 'XYZ' .
I tried searching the forum for solution but could find any.
I would apprieciate if any one can help me asap ....
Many Thanks:) (12 Replies)
Hi
My files is like
a|test|s|
b|test2 | n|
c|ggg|v|
i want to count the no of lines which is ending with "|" ...
Please let me know how can it be done.
Thanks,
Arun (4 Replies)
Hey Guyz I have a requirement something like this..
a part of file name, date of modification of that file and a text is entered as input.
like
Date : 080206 (MMDDYY format.)
filename : hotel_rates
text : Jim
now the file hotel_rates.ZZZ.123 (creation date is Aug 02 2006) should be... (10 Replies)