Using awk to read one file and search in another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to read one file and search in another file
# 8  
Old 11-26-2012
Quote:
Originally Posted by RudiC
Yes - fgrep is slower, but don't forget the influence of I/O buffering when comparing the two. Then pls consider using grep in lieu of fgrep.
There's not much point in finishing faster if the results are incorrect. fgrep (and the awk solutions provided) are not equivalent to grep (without -F).

Regards,
Alister

---------- Post updated at 01:04 PM ---------- Previous update was at 12:31 PM ----------

Quote:
Originally Posted by RudiC
Yes - fgrep is slower
...<snip>...
Code:
$ time grep -f file1 file2 >/dev/null
real    0m0.022s
user    0m0.008s
sys     0m0.012s
$ time fgrep -f file1 file2 >/dev/null
real    0m0.092s
user    0m0.088s
sys     0m0.004s

I'm curious. What implementation are you using? The simpler string comparison of fgrep should be faster than any regular expression engine (including Ken Thompson's NFA approach).

I ran a few tests on an old machine -- P2 350 MHz -- using a quarter million line ~2.5 MB text file with text that repeats every 10 lines. When trying to match two patterns (stored in a second file), fgrep was approximately 16 times faster, ~0.2 seconds versus ~3.3 seconds.

Regards,
Alister
# 9  
Old 11-26-2012
linux with
fgrep (GNU grep) 2.12
grep (GNU grep) 2.12
mawk 1.3.3

What you say is in line with what I expected (fixed string lookup vs. pattern matching), but I enlarged the test file and found very similar results to what I posted before:
Code:
time grep -f file1 file2 >/dev/null
real    0m0.085s
time grep -F -f file1 file2 >/dev/null
real    0m0.226s
time fgrep -f file1 file2 >/dev/null
real    0m0.245s
time awk -F\| 'NR==FNR{a[$1]++;next}a[$1]'  file1 file2 >/dev/null
real    0m0.217s

Pls explain you statement that results between fgrep and grep/awk are not the same or incorrect, resp.; other than the pipe symbol in file1 ORing patterns.
# 10  
Old 11-26-2012
Quote:
Originally Posted by RudiC
linux with
fgrep (GNU grep) 2.12
grep (GNU grep) 2.12
mawk 1.3.3

What you say is in line with what I expected (fixed string lookup vs. pattern matching), but I enlarged the test file and found very similar results to what I posted before:
Code:
time grep -f file1 file2 >/dev/null
real    0m0.085s
time grep -F -f file1 file2 >/dev/null
real    0m0.226s
time fgrep -f file1 file2 >/dev/null
real    0m0.245s
time awk -F\| 'NR==FNR{a[$1]++;next}a[$1]'  file1 file2 >/dev/null
real    0m0.217s

I observed fgrep to be over 16 times faster than grep with GNU [f]grep 2.5.1 on an ancient Debian install.
Quote:
Originally Posted by RudiC
Pls explain you statement that results between fgrep and grep/awk are not the same or incorrect, resp.; other than the pipe symbol in file1 ORing patterns.
Without assurances that the OP's "patterns" won't contain any regular expression metacharacters (e.g. .), it's possible for fgrep/awk (using string comparison) and grep (using regular expression pattern matching) to produce different output.

By default, the pipe symbol isn't special to grep, so that wasn't my concern. However, since that possibility occurred to you, you were obviously already aware of the pitfalls mentioned in the previous paragraph. You simply didn't think it was worth worrying about in this case. I'm not so lenient (probably because I've seen simplified post data waste time in the past).

Honestly, for me, the slowness of your fgrep is the most interesting aspect of this thread. I seldom use GNU and Linux, so I doubt I'll investigate it myself. But, if you are similarly curious and discover the cause, I'd love to know.

Regards,
Alister
# 11  
Old 11-27-2012
Thanks for explaining your point.
I did a similar test on a FreeBSD system, although in a VM. So just consider the relative times.
fgrep (GNU grep) 2.5.1-FreeBSD
grep (GNU grep) 2.5.1-FreeBSD

Code:
$ time grep -f file file2 >/dev/null
real    0m0.222s
$ time fgrep -f file file2 >/dev/null
real    0m0.205s
$ time awk -F"|" '  ... 
real    0m0.863s

Here grep (with and without -F) and fgrep play in the same league whilst awk drags behind by a factor of four (which supports my first assumption).
BUT - the requestor posted that awk was way faster than grep... difficult to understand.


Although being curious as well, I'm not sure how to delve into that problem. I don't think it's system (I/O etc) related, it would be more the algorithms used. Being a bit thick when it comes to analysing C source, I'm afraid I'm giving up.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Use while loop to read file and use ${file} for both filename input into awk and as string to print

I have files named with different prefixes. From each I want to extract the first line containing a specific string, and then print that line along with the prefix. I've tried to do this with a while loop, but instead of printing the prefix I print the first line of the file twice. Files:... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

2. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

3. Programming

C program to read a binary file and search for a string?

Hi, I am not a C programmer. The only C exposure I have is reading and completing the exercises from the C (ANSI C ) Programming Language book:o At the moment, I am using the UNIX strings command to extract information for a binary file and grepping for a particular string and the value... (3 Replies)
Discussion started by: newbie_01
3 Replies

4. Shell Programming and Scripting

Search and replace from file in awk using a 16 bit text file

Hello, Some time ago a helpful awk file was provided on the forum which I give below: NR==FNR{A=$0;next}{for(j in A){split(A,P,"=");for(i=1;i<=NF;i++){if($i==P){$i=P}}}}1 While it works beautifully on English and Latin characters i.e. within the ASCII range of 127, the moment a character beyond... (6 Replies)
Discussion started by: gimley
6 Replies

5. Shell Programming and Scripting

awk read one delimited file, search another delimited file

Hello folks, I have another doozy. I have two files. The first file has four fields in it. These four fields map to different locations in my second file. What I want to do is read the master file (file 2 - 23 fields) and compare each line against each record in file 1. If I get a match in all four... (4 Replies)
Discussion started by: dagamier
4 Replies

6. Shell Programming and Scripting

Want to read data from a file name.txt and search it in another file and then matching...

Hi Frnds... I have an input file name.txt and another file named as source.. name.txt is having only one column and source is having around 25 columns...i need to read from name.txt line by line and search it in source file and then save the result in results file.. I have a rough idea about the... (15 Replies)
Discussion started by: ektubbe
15 Replies

7. Shell Programming and Scripting

Using awk to when reading a file to search and output to file

Hi, I am not sure if this will work or not. I am getting a syntax error. I am reading fileA, using an acct number field trying to see if it exists in fileB and output to new file. Can anyone tell me if what I am doing will work or should I attempt it another way? Thanks. exec < "${fileA}... (4 Replies)
Discussion started by: ski
4 Replies

8. Shell Programming and Scripting

Read a file and search a value in another file create third file using AWK

Hi, I have two files with the format shown below. I need to read first field(value before comma) from file 1 and search for a record in file 2 that has the same value in the field "KEY=" and write the complete record of file 2 with corresponding field 2 of the first file in to result file. ... (11 Replies)
Discussion started by: King Kalyan
11 Replies

9. Shell Programming and Scripting

Need help with awk - how to read a content of a file from every file from file list

Hi Experts. I need to list the file and the filename comes from the file ListOfFile.txt. Basicly I have a filename "ListOfFile.txt" and it contain Example of ListOfFile.txt /home/Dave/Program/Tran1.P /home/Dave/Program/Tran2.P /home/Dave/Program/Tran3.P /home/Dave/Program/Tran4.P... (7 Replies)
Discussion started by: tanit
7 Replies

10. Shell Programming and Scripting

sendmail.cf: How can I read a .db file and search for a token?

Hello, I need to write code in '/etc/mail/sendmail.cf' to verify that a string exists within a hash file ( Such as /etc/mail/key-value.db ). I've searched the web and did find many great articles regarding 'sendmail.cf' however I'm not clear how I can do this specific thing as the online... (0 Replies)
Discussion started by: Devyn
0 Replies
Login or Register to Ask a Question