Compare 2 huge files wrt to a key using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare 2 huge files wrt to a key using awk
# 1  
Old 07-14-2008
Compare 2 huge files wrt to a key using awk

Hi Folks,

I need to compare two very huge file ( i.e the files would contain a minimum of 70k records each) using awk or sed. The comparison needs to be done with respect to a 'key'. For example :

File1
**********
1234|TONY|Y75634|20/07/2008
1235|TINA|XCVB56|30/07/2009
43456|PATS|U74454|12/04/2009
23456|DAPS|R4576|15/03/2008

File2
******
1235|TINA|XCVB56|30/07/2009
1234|TONY|Y75634|20/07/2008
23456|DAPS|R4576|15/03/2008

In this case, if I consider '|' as the delimiter , the value at column 3 as 'key' for the files, I need to look out for this key in the second file and once that is got, I need to compare the values at column 2 of the corresponding records in both the files.

Also, I need to report a message in case the key is not present in file2.

PS: I have a perl script running for this.. but it takes way too long to perform this comparison, your help in suggesting some awk script which would perform this action much faster would be really appreciated.

Thanks in advance
Ranjani
# 2  
Old 07-14-2008
What should be the desired output?
# 3  
Old 07-14-2008
Sorry, i had forgotten to mention the desired output.

The output needs to be logged in file3 which says either : " the correspoding records matched" or " the corresponding records did not match" or "the key in file1 does not exist in file2"

Thanks
Ranjani
# 4  
Old 07-14-2008
Try something like this:
Code:
awk -F"|" '
NR==FNR{a[$3]=$2;next}
a[$3]==$2{print $0 " <= Corresponding records match";next}
a[$3]{print $0 " <= "Corresponding records did not match"}
{print $0 " <= Key File1 not exist in File2"}
' File1 File2

Regards

Last edited by Franklin52; 07-14-2008 at 09:21 AM.. Reason: add record separator
# 5  
Old 07-14-2008
Another one:
(use nawk or /usr/xpg4/bin/awk on Solaris)

Code:
awk>file3 -F\| 'NR==FNR{f1[$3]=$2;next}
{print "key",$3 in f1?$3" records "(f1[$3]==$2?"":"do not ")\
 "match":$3" is missing"}' file2 file1

# 6  
Old 07-14-2008
Hi Franklin and Radoulov.. thanks a lot for your responseS, but I am getting the following o/p when i run this script :
******************************************************
1234 RANJ 45678 y786 <= Corresponding records did not match
1234 RANJ 45678 y786 <= Key File1 not exist in File2
567 SREE 3457 xg456 <= Corresponding records did not match
567 SREE 3457 xg456 <= Key File1 not exist in File2
34567 TANGO 4567 H7694 <= Corresponding records did not match
34567 TANGO 4567 H7694 <= Key File1 not exist in File2
4567 qrea 3456 but731 <= Corresponding records did not match
4567 qrea 3456 but731 <= Key File1 not exist in File2
34567 TANGO 4567 H7694 <= Corresponding records did not match
34567 TANGO 4567 H7694 <= Key File1 not exist in File2
567 SREE 3457 xg456 <= Corresponding records did not match
567 SREE 3457 xg456 <= Key File1 not exist in File2
1234 RANJ 45678 y786 <= Corresponding records did not match
1234 RANJ 45678 y786 <= Key File1 not exist in File2

*******************************************************

This is not the desired o/p. Please could you help :

The i/p files are :
************************
1234 RANJ 45678 y786
567 SREE 3457 xg456
34567 TANGO 4567 H7694
4567 qrea 3456 but731

*****************************
File2:
******************
34567 XAXRO 4567 H7694
567 SREE 3457 xg456
1234 RANJ 45678 y786

The desired output file is :


For the key:45678 , the fields are matching
For the key:3457, the fields are matching
For the key : 4567 , the fileds are not matching
For the key : 45678 , record is not present in file2.

*******************************************

Please provide your useful inputs. Your help would be much appreciated.

Thanks a lot

Ranjani
# 7  
Old 07-14-2008
Perhaps I'm missing something,
this is what I get from the code I posted:
(you changed the field separator so the -F switch is removed)

Code:
$ head file[12]
==> file1 <==
1234 RANJ 45678 y786
567 SREE 3457 xg456
34567 TANGO 4567 H7694
4567 qrea 3456 but731

==> file2 <==
34567 XAXRO 4567 H7694
567 SREE 3457 xg456
1234 RANJ 45678 y786
$ nawk  'NR==FNR{f1[$3]=$2;next}
{print "key",$3 in f1?$3" records "(f1[$3]==$2?"":"do not ")\
 "match":$3" is missing"}' file2 file1
key 45678 records match
key 3457 records match
key 4567 records do not match
key 3456 is missing

You say you want to mark the key 45678 as not present, but it is ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Files summary using awk based on index key

Hello , I have several files which are looking similar to : file01.txt keyA001 350 X string001 value001 keyA001 450 X string002 value007 keyA001 454 X string002 value004 keyA001 500 X string003 value005 keyA001 255 X string004 value006 keyA001 388 X string005 value008 keyA001 1278 X... (4 Replies)
Discussion started by: alex2005
4 Replies

2. Shell Programming and Scripting

awk - Merge two files based on one key

Hi, I am struggling with the an awk command to merge two files based on a common key. I want to append the value from File2 ($2) onto the end of File1 where $1 from each file matches - If no match then nothing is apended File1 COL1|COL2|COL3|COL4|COL5|COL6|COL7... (3 Replies)
Discussion started by: Ads89
3 Replies

3. Shell Programming and Scripting

awk to parse huge files

Hello All, I have a situation as below: (1) Read a source file (a single file of 1.2 million rows in it ) (2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file. I tried as below: ( please note I am not... (4 Replies)
Discussion started by: panyam
4 Replies

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

5. Shell Programming and Scripting

match two key columns in two files and print output (awk)

I have two files... file1 and file2. Where columns 1 and 2 of file1 match columns 1 and 2 of file2 I want to create a new file that is all file1 + columns 3 and 4 of file2 :b: Many thanks if you know how to do this.... :b: file1 31-101 106 0 92 31-101 106 29 ... (2 Replies)
Discussion started by: pelhabuan
2 Replies

6. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

7. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

8. Shell Programming and Scripting

Compare Fields from two text files using key columns

Hi All, I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values. Using Perl, I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6... (2 Replies)
Discussion started by: Sangtha
2 Replies

9. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

10. Solaris

compare huge file

Hi, I have files with records of 40,00,000& 39,00,000 and i want to find out the content 1.which is existing in file1 and not in file2. 2.Which is exisitng in file2 and not in file1. The format of the file will be like 404ABCDEFGHIJK|CDEFGHIJK|1234567890|1 If its a smaller one i... (1 Reply)
Discussion started by: salaathi
1 Replies
Login or Register to Ask a Question