List all file names that contain two specific words. ( follow up )


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users List all file names that contain two specific words. ( follow up )
# 1  
Old 11-30-2016
List all file names that contain two specific words. ( follow up )

Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line.
This thread

"List all file names that contain two specific words."

answered it in part, but I was looking for a more concise solution.


Here's a one-line suggestion using awk


Code:
find . -name "*" -exec awk -v w1=WORD1 -v w2=WORD2 '
BEGIN {W1=0;W2=0} {
if($0 ~ w1) {
W1=1
} if($0 ~ w2) {
W2=1
}
} END {
if(W1+W2>1) {
print FILENAME
}
} ' {} \; 2>/dev/null

This can be embedded in a shell calling the 2 arguments WORD1 and WORD2....
# 2  
Old 11-30-2016
Okay good try. - let's consider some things.

How about:
Code:
for fname in $(find . -type f ) 
do
     grep -Fq WORD1 "$fname" && grep -Fq WORD2 "$fname" && echo "$fname"
done

I think your solution would report directories, for example. It also would report a file on a search for "goo" when the file had the word "good".

My example also has failings.

You decide. It all depends on the exact requirements for the script. Plus, you can script the same solution using multiple tools. In this case awk, grep, or even just plain bash.
# 3  
Old 11-30-2016
Hi,
If you are under linux or/and if your grep support option:
Code:
grep -R -Pzl 'WORD1(.*\n)*.*WORD2|WORD2(.*\n)*.*WORD1' .

Regards.
# 4  
Old 11-30-2016
Could you consider a double grep?
Code:
find . | xargs grep -l "WORD1" | xargs grep -l "WORD2"

Notes:-
  • There is no need to specify -name "*" on the find command.
  • The -l flag (lower case L) means you only get filenames out of grep
  • This assumes that there are only regular files in the current directory. Add -type f to the find if this is not the case.
  • If you need exact word searching (i.e. not match good when searching for goo) you can add the -w flag to each grep
  • If the files are large, this will read some files twice, although if the matches are early it will not have to read the whole file.


I hope that this helps,
Robin

Last edited by rbatte1; 11-30-2016 at 11:56 AM.. Reason: Added ICODE tags for the -l flag note
# 5  
Old 11-30-2016
Hi.

We ran across a need for this some time ago, and wrote a solution that has worked for us.

In between projects, we discuss how we should publish our code: our own website, sourceforge, girhub, as a post in a thread (as Corona688 has done here, for example, among others). No consensus so far, sigh.

We have agreed that we can at least post the documentation for our utilities in hopes that it may provide motivation for others to use approaches that have worked (at least for us).

So here are some details on our rapgrep -- this is clearly not a one-line suggestion Smilie
Code:
rapgrep Require all patterns grep. (what)
Path    : ~/bin/rapgrep
Version : 1.2
Length  : 307 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Help    : probably available with [     ]-h
Modules : (for perl codes)
 warnings       1.23
 strict 1.08
 English        1.09
 Carp   1.3301
 Data::Dumper   2.151_01
 Getopt::Long   2.42

and the help :
Code:
Script rapgrep reads files and matches patterns as provided by the
caller.  If all patterns successfully match at least once, then
the file name is printed.  Some details of the matching results
may be requested to be printed.

usage: rapgrep [options] -- [files]

options:
--all
  Force all lines to be searched.  The default is to quit if
  all matches are successful even if EOF is not read yet.

-e=pattern
 Use perl pattern for searching.  More than one -e=p may be used.
 However, if the control statement becomes unwieldy, see -f.

--file=pathname
  Read file at pathname for patterns, one per line.  More than
  one --filename=path may used.  All -e and -f contents are
  collected and used.  A "#" may be used for comment lines in the
  files.

--ignore
  Ignore case in matches.  Default is case is significant.

--reverse
  Invert the sense of success: if a filename normally would 
  not be printed, then print it; if normally printed, omit it.

--list=rx
  List the reasons why a filename is not printed ("r").  List the
  details of the pattern matches: how many of which pattern in
  what file.

--comment=string
  Change the comment character in the pattern files to any in the
  string.

--h (or -h)
  print this message and quit.

--version
  print this message and quit.

Best wishes ... cheers, drl
# 6  
Old 12-01-2016
Thanks a lot for your contributions, which I compared...

My solution was slowest, ( so I guess using exec within the find command is not very efficient ) and as you predicted, included files containing any character string, not just whole words. But this can be a requirement, actually.
Code:
Times:                  real    0m45.672s : user    0m28.487s : sys     0m15.590s
Jim's loop took         real    0m20.383s : user    0m8.548s  : sys     0m10.992s
Robin's was faster      real    0m4.126s  : user    0m2.883s  : sys     0m0.303s

... and disedorgue's solution caused a core dump as run, so I didn't try fiddling with it too much, as it's not my system ! In any case our production system has many non-linux machines, so bash options won't work.
Not sure where the code is for drl's solution, didn't find rapgrep on Bing either. What am I missing ?

Regards,


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!


---------- Post updated at 04:48 AM ---------- Previous update was at 03:56 AM ----------

Hi again,
You guys have opened my eyes regarding find . -exec... which I regularly use
I know this isn't strictly the post subject, but I just wanted to comment on the difference between
Code:
time find . 2>/dev/null | xargs grep -l "$chn1" 2>/dev/null | xargs grep -l "$chn2" 2>/dev/null

real 0m6,38s

and
Code:
time find . -exec grep -l "$chn1" {} \; 2>/dev/null | xargs grep -l "$chn2" 2>/dev/null

real 2m15,43s !!!!!!!!!!!

Thanks for this revelation !

Last edited by rbatte1; 12-01-2016 at 06:45 AM.. Reason: Added ICODE tags.
# 7  
Old 12-01-2016
The difference you are seeing is probably because your find . -exec grep ..... runs the grep command individually for each file. The use of xargs in my suggestion reduces the number of command calls and therefore the number of process spawned. It may not be the best way, but it works okay.

You might be able to use a + and the end of your -exec section of the find instead of the \;, but it depends on which version of find you have.

Be aware that times may vary depending on the number of files and their sizes, so searching a very few large files may be slow with my suggestion because it will potentially read the files twice.


Glad to have helped a bit this time, but do keep experimenting if the times get longer.



Kind regards,
Robin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to replace some specific words from file?

I have the file like this. cat 123.txt <p> <table border='1' width='90%' align='center' summary='Script output'> <tr><td>text </td> </tr> </table> </p> I want to replace some tags and want the output like below. I tried with awk & sed commands. But no luck. Could someone help me on this? ... (4 Replies)
Discussion started by: thomasraj87
4 Replies

2. UNIX for Beginners Questions & Answers

Non-root script used search and list specific key words

Hy there all. Im new here. Olso new to terminal & bash, but it seams that for me it's much easyer to undarsatnd scripts than an actual programming language as c or anyother languare for that matter. S-o here is one og my home works s-o to speak. Write a shell script which: -only works as a... (1 Reply)
Discussion started by: Crisso2Face
1 Replies

3. Shell Programming and Scripting

Help with print out record if first and next line follow specific pattern

Input file: pattern1 100 250 US pattern2 50 3050 UK pattern3 100 250 US pattern1 70 1050 UK pattern1 170 450 Mal pattern2 40 750 UK . . Desired Output file: pattern1 100 250 US pattern2 50 3050 UK pattern1 170 450 Mal pattern2... (3 Replies)
Discussion started by: cpp_beginner
3 Replies

4. Shell Programming and Scripting

Execution problem with print out record that follow specific pattern

Hi, Do anybody know how to print out only those record that column 1 is "a" , then followed by "b"? Input file : a comp92 2404242 2405172 b comp92 2405303 2406323 b comp92 2408786 2410278 a comp92 2410271 2410337 a comp87 1239833 1240418 b comp87... (3 Replies)
Discussion started by: patrick87
3 Replies

5. Shell Programming and Scripting

List all file names that contain two specific words.

Hi, all: I would like to search all files under "./" and its subfolders recursively to find out those files contain both word "A" and word "B", and list the filenames finally. How to realize that? Cheers JIA (18 Replies)
Discussion started by: jiapei100
18 Replies

6. UNIX and Linux Applications

Reading a file for specific words

Hi I have a script where the user calls it with arguments like so: ./import.sh -s DNSNAME -d DBNAME I want to check that the database entered is valid by going through a passwd.ds file and checking if the database exists there. If it doesn't, the I need to send a message to my log... (4 Replies)
Discussion started by: ladyAnne
4 Replies

7. Shell Programming and Scripting

To fetch specific words from a file

Hi All, I have a file like this,(This is a sql output file) cat query_file 200000029 12345 10001 0.2 0 I want to fetch the values 200000029,10001,0.2 .I tried using the below code but i could get... (2 Replies)
Discussion started by: girish.raos
2 Replies

8. AIX

find for specific content in file in the directory and list only file names

Hi, I am trying to find the content of file using grep and find command and list only the file names but i am getting entire file list of files in the directory find . -exec grep "test" {} \; -ls Can anyone of you correct this (2 Replies)
Discussion started by: madhu_Jagarapu
2 Replies

9. UNIX for Dummies Questions & Answers

Search File for Specific Words

I have a file that contains the following: Mon Dec 3 15:52:57 PST 2o007: FAILED TO PROCESSED FILE 200712030790881200.TXT - exit code=107 Tue Dec 4 09:08:57 PST 2007: FAILED TO PROCESSED FILE 200712030790879200a.TXT - exit code=107 This file also has a lot more stuff since it is a log file.... (2 Replies)
Discussion started by: mevasquez
2 Replies

10. Shell Programming and Scripting

how to find capital letter names in a file without finding words at start of sentence

Hi, I want to be able to list all the names in a file which begin with a capital letter, but I don't want it to list words that begin a new sentence. Is there any way round this? Thanks for your help. (1 Reply)
Discussion started by: kev269
1 Replies
Login or Register to Ask a Question