List all file names that contain two specific words.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting List all file names that contain two specific words.
# 15  
Old 06-13-2010
Brief timing note

Hi.

I created a 12 MB file that had the first pattern at the end of the file, and the other not present.

I timed 5 solutions: alister's suggested 2-grep, a similar method but with the more-featured cgrep, rapgrep (a perl code: "require all patterns"), glark (a Ruby code), and an awk script. The 2-grep, 2-cgrep, and awk were the fastest in that order, far faster than the perl and Ruby codes.

That 2 passes through the file would be faster than a single pass surprised me. The grep family appears to be very well-written, as is the gawk processor ... cheers, drl
# 16  
Old 06-13-2010
Did you happen to test my suggestions as well? In brief testing I found them to be even faster, possibly due to the fact that the grep moves on to the next file after the first match and only those files that contain a match for the first pattern are grepped for the second...

S.
# 17  
Old 06-13-2010
Hi, Scrutinizer.

In a way I did.

The "optimized" part of the timed code for the grep segment went like this:
Code:
grep -q -m 1 "$pattern1" "$FILE" && grep -q -m 1 "$pattern2" "$FILE"

I think that probably does much the same thing as your suggestion. I used alister's suggestion as the base.

Note that in the case of the 12 MB file, this probably does not matter because the first string to be matched is specifically placed in the last line forcing the first grep to go all the way through, succeeding, and then the second grep to take place, failing -- a worst-case situation. However, either the -l option or the -m 1 option should work the same -- i.e. bail out at the first match, although I admit that I did not compare one to the other. In a production environment, one or the other should be used to avoid wasted time.

I did something similar for cgrep -- the option differs in syntax but is the same semantically.

I did not use the glark feature to run recursively because I wanted most of the infrastructure to be the same -- find, xargs, etc. The find, xargs construction accounts for another possible worst-case, where one might have many (too many) files that pass the first test.

Thanks for your observations & feedback ... cheers, drl
# 18  
Old 06-14-2010
Hi drl,

Thanks for your observations & feedback in return. I guess probably the observed difference stems from a continual call to the grep program as part of the loop vs. 2 times a single call to grep scanning all required files in one go...

S.

---------- Post updated 14-06-10 at 08:16 ---------- Previous update was 13-06-10 at 23:58 ----------

Quote:
Originally Posted by drl
Hi.

.. The grep family appears to be very well-written, as is the gawk processor ... cheers, drl
Grep is very efficient indeed. Also have a look at mawk. In my experience it usually beats gawk on speed...
# 19  
Old 06-14-2010
Hi.

Indeed, mawk is slightly faster than gawk. The mawk variant is occasionally mentioned in comp.lang.awk. It is available in the Debian repositories, so I often have it available, but I rarely install it in other distributions that I have in (testing installs on) virtual machines. For example, it s not present in the standard repositories for openSUSE 11.2 "Emerald".

Both claim to be standard compliant, although the exact phrase for gawk is " ... almost completely POSIX 1003.2 compliant ..." Smilie

Here are the versions and timings for one run against a 12 MB file that has "fiber" only at the very end of the file, and "alpha" does not occur at all -- using the techniques discussed earlier:
Code:
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
cgrep - (local: ~/executable/cgrep May 29 2009 )
glark, version 1.8.0
rapgrep (local) 1.2
GNU grep 2.5.3
GNU Awk 3.1.5
mawk - ( /usr/bin/mawk Apr 5 2008 )

 Results of finding "fiber" and "alpha" contained in files, grep:

real	0m0.088s
user	0m0.036s
sys	0m0.036s

-----
 Results of finding "fiber" and "alpha" contained in files, cgrep:
fiber

real	0m0.118s
user	0m0.052s
sys	0m0.032s

-----
 Results of finding "fiber" and "alpha" contained in files, rapgrep:

real	0m13.930s
user	0m11.309s
sys	0m0.020s

-----
 Results of finding "fiber" and "alpha" contained in files, glark:

real	0m11.703s
user	0m8.865s
sys	0m0.696s

-----
 Results of finding "fiber" and "alpha" contained in files, mawk:

real	0m0.176s
user	0m0.140s
sys	0m0.020s

-----
 Results of finding "fiber" and "alpha" contained in files, awk:

real	0m0.300s
user	0m0.260s
sys	0m0.012s

cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to replace some specific words from file?

I have the file like this. cat 123.txt <p> <table border='1' width='90%' align='center' summary='Script output'> <tr><td>text </td> </tr> </table> </p> I want to replace some tags and want the output like below. I tried with awk & sed commands. But no luck. Could someone help me on this? ... (4 Replies)
Discussion started by: thomasraj87
4 Replies

2. UNIX for Advanced & Expert Users

List all file names that contain two specific words. ( follow up )

Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line. This thread "List all file names that contain two specific words." answered it in part, but I was looking for a more concise solution. Here's a one-line suggestion... (8 Replies)
Discussion started by: Symbo53
8 Replies

3. UNIX for Beginners Questions & Answers

Non-root script used search and list specific key words

Hy there all. Im new here. Olso new to terminal & bash, but it seams that for me it's much easyer to undarsatnd scripts than an actual programming language as c or anyother languare for that matter. S-o here is one og my home works s-o to speak. Write a shell script which: -only works as a... (1 Reply)
Discussion started by: Crisso2Face
1 Replies

4. Shell Programming and Scripting

find specific file names and execute a command depending on file's name

Hi, As a newbie, I'm desperate ro make my shell script work. I'd like a script which checks all the files in a directory, check the file name, if the file name ends with "extracted", store it in a variable, if it has a suffix of ".roi" stores in another variable. I'm going to use these two... (3 Replies)
Discussion started by: armando110
3 Replies

5. Shell Programming and Scripting

Search a test file for specific words

I have the need to search a text file from my unix script to determine if it contains the strings of: 'ERROR' and/or 'WARNING'. By using Grep I can search the file and return a where one of these strings exists. Like this: cat myfile.txt | grep ERROR Output: PROCESS ERROR HERE ... (3 Replies)
Discussion started by: buechler66
3 Replies

6. UNIX and Linux Applications

Reading a file for specific words

Hi I have a script where the user calls it with arguments like so: ./import.sh -s DNSNAME -d DBNAME I want to check that the database entered is valid by going through a passwd.ds file and checking if the database exists there. If it doesn't, the I need to send a message to my log... (4 Replies)
Discussion started by: ladyAnne
4 Replies

7. Shell Programming and Scripting

To fetch specific words from a file

Hi All, I have a file like this,(This is a sql output file) cat query_file 200000029 12345 10001 0.2 0 I want to fetch the values 200000029,10001,0.2 .I tried using the below code but i could get... (2 Replies)
Discussion started by: girish.raos
2 Replies

8. AIX

find for specific content in file in the directory and list only file names

Hi, I am trying to find the content of file using grep and find command and list only the file names but i am getting entire file list of files in the directory find . -exec grep "test" {} \; -ls Can anyone of you correct this (2 Replies)
Discussion started by: madhu_Jagarapu
2 Replies

9. UNIX for Dummies Questions & Answers

Search File for Specific Words

I have a file that contains the following: Mon Dec 3 15:52:57 PST 2o007: FAILED TO PROCESSED FILE 200712030790881200.TXT - exit code=107 Tue Dec 4 09:08:57 PST 2007: FAILED TO PROCESSED FILE 200712030790879200a.TXT - exit code=107 This file also has a lot more stuff since it is a log file.... (2 Replies)
Discussion started by: mevasquez
2 Replies

10. Shell Programming and Scripting

how to find capital letter names in a file without finding words at start of sentence

Hi, I want to be able to list all the names in a file which begin with a capital letter, but I don't want it to list words that begin a new sentence. Is there any way round this? Thanks for your help. (1 Reply)
Discussion started by: kev269
1 Replies
Login or Register to Ask a Question