Compare 2 huge files wrt to a key using awk

07-14-2008

Registered User

8, 0

Join Date: Jul 2008

Last Activity: 15 July 2008, 4:58 AM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks a ton Radoulov , the code is working perfectly fine. I am sorry for the typo, it was the key : 3456 . Thanks a lot for your time.

Best Regards
Ranjani

Last edited by Ranjani; 07-14-2008 at 11:04 AM..

Ranjani

View Public Profile for Ranjani

Find all posts by Ranjani

07-14-2008

Registered User

8, 0

Join Date: Jul 2008

Last Activity: 15 July 2008, 4:58 AM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi , just another quick doubt.

What do i need to do incase i want to compare the values of 2 fields as in ( the value at col 1 and value at col2 ) for the corresponding records .

Your help will be much appreciated. Thanks a lot again

Ranjani

Ranjani

View Public Profile for Ranjani

Find all posts by Ranjani

07-14-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Like this?

Code:

awk 'NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }
{ 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }' file2 file1

radoulov

View Public Profile for radoulov

Find all posts by radoulov

07-14-2008

Registered User

8, 0

Join Date: Jul 2008

Last Activity: 15 July 2008, 4:58 AM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Awesome!! .. works perfect .. thanks a lot for all your help. Its much appreciated

Ranjani

View Public Profile for Ranjani

Find all posts by Ranjani

07-14-2008

Registered User

8, 0

Join Date: Jul 2008

Last Activity: 15 July 2008, 4:58 AM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Could I ask another doubt in addition : I hope you guys dont mind :

I would want to add a condition to check for blank records in between , as in, right now if there is any blank records , the script output : key is missing . It identifies the key as a simple space. I want it to display an error "blank record in the file". I tried doing it, but could not succeed... Need your help in this as well. Thanks a lot.

Actually I am a newbee to awk and hence finding it difficult to do the modifications.. it would be great if you could post an expalination to how this code would work so that i could understand and take it further myself.

Thanking you in advance
Ranjani

Ranjani

View Public Profile for Ranjani

Find all posts by Ranjani

07-14-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Something like this?

Code:

awk 'NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }
NF { 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }' file2 file1

The above code works like this:

Code:

NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }

While reading the first file - NR == FNR (the current record number of the entire input equals the current record number of the current file, this is a common AWK idiom) build the f1 associative array: the third field is the key, the first and the second fields are the value (check the awk documentation for SUBSEP, you can use FS here also). The next statement:

[from effective awk programming]

Quote:

forces awk to immediately stop processing the current record and go
on to the next record. This means that no further rules are executed for the current record,
and the rest of the current rule’s action isn’t executed.

Code:

{ 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }

While reading the second file print the string "key", followed by the result of the following expression:

Code:

$3 in f1 ? $3 " records " (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") "match" : $3 " is missing"

The if?then:else is the ternary operator, it means: if a key in the f1 array matches the third field of the current (the second) file - $3 in f1 , then print the third field followed by the string " records " followed by (another ternary operator): if the value of the previous key equals the first and the second field pair in the current file, then nothing (""), else print the string "do not " (end of the embedded second ternary operator), followed by the string "match", else (the first ternary operator) print the third field and the string " is missing".

That's all.

Last edited by radoulov; 07-14-2008 at 05:18 PM..

radoulov

View Public Profile for radoulov

Find all posts by radoulov

07-15-2008

Registered User

8, 0

Join Date: Jul 2008

Last Activity: 15 July 2008, 4:58 AM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Perfect! .. thank a lot Radoulov... Thanks a ton for all the help!!

Ranjani

View Public Profile for Ranjani

Find all posts by Ranjani

Shell Programming and Scripting

Compare 2 huge files wrt to a key using awk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Files summary using awk based on index key

Discussion started by: alex2005

2. Shell Programming and Scripting

awk - Merge two files based on one key

Discussion started by: Ads89

3. Shell Programming and Scripting

awk to parse huge files

Discussion started by: panyam

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Discussion started by: lathigara

5. Shell Programming and Scripting

match two key columns in two files and print output (awk)

Discussion started by: pelhabuan

6. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Discussion started by: anandek

7. Shell Programming and Scripting

Format & Compare two huge CSV files

Discussion started by: Sheel

8. Shell Programming and Scripting

Compare Fields from two text files using key columns

Discussion started by: Sangtha

9. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

10. Solaris

compare huge file

Discussion started by: salaathi