Finding all files based on pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding all files based on pattern
# 8  
Old 12-19-2014
Try this
Code:
awk     '               {sub (/--.*$/,"")}
         /\/\*/,/\*\//  {next}
         $0 ~ PAT       {print FILENAME}
        ' PAT="insurance_no" file*

It removes comments first (assuming /* ... */ commenting out full lines) and checks for the pattern then; may need refinements to deal with Don Cragun's questions.
This User Gave Thanks to RudiC For This Post:
# 9  
Old 12-19-2014
Hi RudiC,

Thanks for doing in awk way..
I just ran this and getting below error, tried to correct syntactical error by myself but wasn't able to

Code:
bash-3.2$ awk     '               {sub (/--.*$/,"")}
>          /\/\*/,/\*\//  {next}
>          $0 ~ PAT       {print FILENAME}
>         ' PAT="insurance_no" test_*
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 3
awk: bailing out near line 3

# 10  
Old 12-19-2014
I bet it's a Solaris 10 system, because I get the exact messages using default awk.

Try /usr/xpg4/bin/awk or nawk
This User Gave Thanks to junior-helper For This Post:
# 11  
Old 12-19-2014
Yes exactly junior-helper Thanks a lot..
# 12  
Old 12-19-2014
Hopefully, RudiC's suggested awk script got you started down a workable path. Unfortunately, with an input file like:
Code:
pattern /* comment */
pattern2 /* start
continue comment
This >>> pattern <<< should never be seen.
continue comment
end */ pattern3
/* comment1 */ pattern4 /* comment 2 */

I believe that if your search pattern is pattern, RudiC's script will not find any of the four occurrence of pattern in the above file that are not in comment fields.

You didn't mention anything about quoted strings. If -- or /* and */ do not denote comments if they are single quoted or double quoted (as in a shell script or C code), the following script won't work either. (If you need something that will ignore comments found in quoted strings, maybe you can use the following as a guide on how to attack that problem; but I won't volunteer to do that for you here. A general parser like that is too much like work for me to offer to do it for free. Smilie)

The following script will work with any ksh, with /usr/xpg4/bin/sh, /usr/xpg6/bin/sh, or with bash (if bash is installed on your Solaris system). First copy the following into a file named NoCommentPattern.awk:
Code:
# Check to see if we already had a match in this file...
nf > 0 {if(FNR == 1)	nf = 0
	else		next
}
d {	printf("===%d%d\t%s\n", nf, ssc, $0)
}
# Strip out any comments (or skip line completely if we're in the middle of a
# multi-line comment.
{	if(ssc) {
		# An earlier line had an unclosed comment starting with "/*"...
		if(s = index($0, "*/")) {
			$0 = substr($0, s + 2)
			if(d) printf("Updated $0:\n\t%s\n", $0)
			ssc = 0
		} else	next
	}
	# Search for "/*...*/" and "--" comments.
	while(match($0, "[-][-]|[/][*]")) {
		if(substr($0, RSTART, 1) == "-") {
			# Found -- comment; throw away the rest of the line...
			if(RSTART == 1) {
				if(d) printf("Comment line deleted.\n")
				next
			}
			$0 = substr($0, 1, RSTART - 1)
			if(d) printf("Updated $0:\n\t%s\n", $0)
			break
		}
		# Found start of "/*" comment; look for the end of comment...
		if(s = index(substr($0, RSTART + 2), "*/")) {
			# End found, delete comment from line and look for more.
			$0 = (RSTART > 1 ? substr($0, 1, RSTART - 1) : "") \
				substr($0, RSTART + s + 3)
			if(d) printf("Updated $0:\n\t%s\n", $0)
		} else {# We found the start of a "/*...*/" commment but not
			# the end.  Process the part of this line before the
			# comment...
			ssc = 1
			if(RSTART == 1) {
				if(d) printf("Comment line deleted.\n")
				next
			}
			$0 = substr($0, 1, RSTART - 1)
			if(d) printf("Updated $0:\n\t%s\n", $0)
			break
		}
	}
}
# Look for pattern in current line after comments have been stripped.
index($0, P) {
	# Found it...
	print FILENAME
	nf = 1
}

and create a script (for this example, call it findpat) containing:
Code:
#!/usr/xpg4/bin/sh
pat=${1:-insurance_no}
if [ $# -gt 1 ]
then	debug=1
else	debug=0
fi
find . -type f -exec /usr/xpg4/bin/awk -v P="$pat" -v d="$debug" -f NoCommentPattern.awk {} +

and make it executable:
Code:
chmod +x findpat

Then the command:
Code:
./findpat

or:
Code:
./findpat "insurance_no"

will search for any regular files containing insurance_no in the directory hierarchy rooted in the current directory that is not in a comment and print the names of any files that meet these conditions.

If you invoke it with two or more arguments:
Code:
./findpat "Search Pattern" debug

it will print lots of debugging information while it searches for matching files so you can see the lines it is processing and how it strips out comments before looking for the pattern. Once you understand how it works, you can make the script run a little bit faster if you strip out the debugging code.

Note that if you run this script in a directory other than where you place the file NoCommentPattern.awk, you'll need to modify the script to use an absolute pathname to where this file is located. This script should work even if there are spaces or tabs in your search pattern, but it will not find it if your pattern matches text that starts on one line and continues onto the next line.

If someone else wants to try this on a system where awk includes support for the nextfile function, this script can be made a lot faster by using it instead of setting nf = 1 when a match is found, reading the remainder of the file, and setting nf back to zero when the 1st line of the next file is found.
This User Gave Thanks to Don Cragun For This Post:
# 13  
Old 12-22-2014
Thanks Don for this useful script!!!

Hi Rudic,

Was just building on your awk commands three things still i am not able to sort out.

1) How to use this awk command recursively in all subdirectory. as of now i am thinking to utilize find something like below

Code:
find . -name "*.*" -exec /usr/xpg4/bin/awk     '               {sub (/--.*$/,"")}
         {sub('/\/\*/,/\*\//',"")}
         $0 ~ PAT       {print FILENAME}
        ' PAT="insurance_no" {} \;

2) if there is multiple occurence of the insurance_no( outside the comment )for each occurence its resulting filename. To just resolve this i put sort -u after pipe. but any other way in awk itself to limit if first occurence then give file name and come out search for another file

Code:
bash-3.2$ find . -name "test_*" -exec /usr/xpg4/bin/awk     '               {sub (/--.*$/,"")}
>          {sub('/\/\*/,/\*\//',"")}
>          $0 ~ PAT       {print FILENAME}
>         ' PAT="RQST_ID" {} \;
./test_2.txt
./test_2.txt
./test_1.txt
./test_1.txt
./test_1.txt

3) As of my understanding this awk command is literally going inside each file and replacing the commented part and then searching for the pattern. in this way performance will degrade. Can we do this in vice versa way first literally search pattern and then replace commented part and then see if there is still pattern in un-commented part.
# 14  
Old 12-22-2014
1) Yes, that should work
2) If your awk has the exitfile statement, place that right after the print stmt. If not, you need to create a logical construct like Don Cragun did.
3) It has to read the file either way. Not removing the comments first would again raise the need to create a logical contstruct, and it would not increase performance
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

2. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ... (2 Replies)
Discussion started by: atelford
2 Replies

3. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies

4. Shell Programming and Scripting

Finding/replacing strings in some files based on a file

Hi, We have a file (e.g. a .csv file, but could be any other format), with 2 columns: the old value and the new value. We need to modify all the files within the current directory (including subdirectories), so find and replace the contents found in the first column within the file, with the... (9 Replies)
Discussion started by: Talkabout
9 Replies

5. Shell Programming and Scripting

finding the files based on date..

Hi to every one , i had ascenario like this.. i had path like export/home/pmutv/test/ in this i will recive 43 files daily with each file having that days date i.e like product.sh.20110512 like this i will 43 files every day i had to find the files. if files are avaliable i... (2 Replies)
Discussion started by: apple2685
2 Replies

6. UNIX for Dummies Questions & Answers

finding and moving files based on the last three numerical characters in the filename

Hi, I have a series of files (upwards of 500) the filename format is as follows CC10-1234P1999.WGS84.p190, all in one directory. Now the last three numeric characters, in this case 999, can be anything from 001 to 999. I need to move some of them to a seperate directory, the ones I need to... (5 Replies)
Discussion started by: roche.j.mike
5 Replies

7. UNIX for Dummies Questions & Answers

finding all files that do not match a certain pattern

I hope I'm asking this the right way -- I've been sending out a lot of resumes and some of them I saw on Craigslist -- so I named the file as 'Craigslist -- (filename)'. Well I noticed that at least one of the files was misspelled as 'Craigslit.' I want to eventually try to write a shell... (5 Replies)
Discussion started by: Straitsfan
5 Replies

8. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I... (8 Replies)
Discussion started by: anjas
8 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that... (6 Replies)
Discussion started by: aarora_98
6 Replies
Login or Register to Ask a Question