Compare 2 huge files wrt to a key using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare 2 huge files wrt to a key using awk
# 8  
Old 07-14-2008
Thanks a ton Radoulov , the code is working perfectly fine. I am sorry for the typo, it was the key : 3456 . Thanks a lot for your time.

Best Regards
Ranjani

Last edited by Ranjani; 07-14-2008 at 11:04 AM..
# 9  
Old 07-14-2008
Hi , just another quick doubt.

What do i need to do incase i want to compare the values of 2 fields as in ( the value at col 1 and value at col2 ) for the corresponding records .

Your help will be much appreciated. Thanks a lot again

Ranjani
# 10  
Old 07-14-2008
Like this?
Code:
awk 'NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }
{ 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }' file2 file1

# 11  
Old 07-14-2008
Awesome!! .. works perfect .. thanks a lot for all your help. Its much appreciated Smilie
# 12  
Old 07-14-2008
Could I ask another doubt in addition : I hope you guys dont mind :

I would want to add a condition to check for blank records in between , as in, right now if there is any blank records , the script output : key is missing . It identifies the key as a simple space. I want it to display an error "blank record in the file". I tried doing it, but could not succeed... Need your help in this as well. Thanks a lot.

Actually I am a newbee to awk and hence finding it difficult to do the modifications.. it would be great if you could post an expalination to how this code would work so that i could understand and take it further myself.

Thanking you in advance
Ranjani
# 13  
Old 07-14-2008
Something like this?

Code:
awk 'NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }
NF { 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }' file2 file1


The above code works like this:


Code:
NR == FNR { 
  f1[$3] = $1 SUBSEP $2 
  next
  }

While reading the first file - NR == FNR (the current record number of the entire input equals the current record number of the current file, this is a common AWK idiom) build the f1 associative array: the third field is the key, the first and the second fields are the value (check the awk documentation for SUBSEP, you can use FS here also). The next statement:

[from effective awk programming]
Quote:
forces awk to immediately stop processing the current record and go
on to the next record. This means that no further rules are executed for the current record,
and the rest of the current rule’s action isn’t executed.
Code:
{ 
  print "key", 
  $3 in f1 ? $3 " records " \
  (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") \
  "match" : $3 " is missing" 
  }

While reading the second file print the string "key", followed by the result of the following expression:

Code:
$3 in f1 ? $3 " records " (f1[$3] == $1 SUBSEP $2 ? "" : "do not ") "match" : $3 " is missing"

The if?then:else is the ternary operator, it means: if a key in the f1 array matches the third field of the current (the second) file - $3 in f1 , then print the third field followed by the string " records " followed by (another ternary operator): if the value of the previous key equals the first and the second field pair in the current file, then nothing (""), else print the string "do not " (end of the embedded second ternary operator), followed by the string "match", else (the first ternary operator) print the third field and the string " is missing".

That's all.

Last edited by radoulov; 07-14-2008 at 05:18 PM..
# 14  
Old 07-15-2008
Perfect! .. thank a lot Radoulov... Thanks a ton for all the help!! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Files summary using awk based on index key

Hello , I have several files which are looking similar to : file01.txt keyA001 350 X string001 value001 keyA001 450 X string002 value007 keyA001 454 X string002 value004 keyA001 500 X string003 value005 keyA001 255 X string004 value006 keyA001 388 X string005 value008 keyA001 1278 X... (4 Replies)
Discussion started by: alex2005
4 Replies

2. Shell Programming and Scripting

awk - Merge two files based on one key

Hi, I am struggling with the an awk command to merge two files based on a common key. I want to append the value from File2 ($2) onto the end of File1 where $1 from each file matches - If no match then nothing is apended File1 COL1|COL2|COL3|COL4|COL5|COL6|COL7... (3 Replies)
Discussion started by: Ads89
3 Replies

3. Shell Programming and Scripting

awk to parse huge files

Hello All, I have a situation as below: (1) Read a source file (a single file of 1.2 million rows in it ) (2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file. I tried as below: ( please note I am not... (4 Replies)
Discussion started by: panyam
4 Replies

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

5. Shell Programming and Scripting

match two key columns in two files and print output (awk)

I have two files... file1 and file2. Where columns 1 and 2 of file1 match columns 1 and 2 of file2 I want to create a new file that is all file1 + columns 3 and 4 of file2 :b: Many thanks if you know how to do this.... :b: file1 31-101 106 0 92 31-101 106 29 ... (2 Replies)
Discussion started by: pelhabuan
2 Replies

6. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

7. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

8. Shell Programming and Scripting

Compare Fields from two text files using key columns

Hi All, I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values. Using Perl, I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6... (2 Replies)
Discussion started by: Sangtha
2 Replies

9. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

10. Solaris

compare huge file

Hi, I have files with records of 40,00,000& 39,00,000 and i want to find out the content 1.which is existing in file1 and not in file2. 2.Which is exisitng in file2 and not in file1. The format of the file will be like 404ABCDEFGHIJK|CDEFGHIJK|1234567890|1 If its a smaller one i... (1 Reply)
Discussion started by: salaathi
1 Replies
Login or Register to Ask a Question