Comparing two files and inserting new fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two files and inserting new fields
# 1  
Old 11-17-2010
Comparing two files and inserting new fields

Hi all,

I searched the forum and tried to learn from the similar posts. However, I am new and I need to get help on this. I hope an expert kindly help me to sort this out.

I need to compare field 1 and 2 of the first file with the same fields of the second file and if both fields matches insert the field 9 and 10 of the second file to the position 1 and 2 of the first file.

1st file:
Code:
F1      F2      F3      F4      F5      F6      F7      F8      F9      F10     F11     F12     F13     F14
180193  99999   15960   1       18      19      3       16      11      0       54      0.01    99999   99999
180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
999999  13721   12548   1       20      9       -12     7       -1      0       66      0       0       7
180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
999999  15264   11446   1       22      10      -9      9       1       0       64      99999   99999   99999
170265  99999   15960   1       26      17      3       12      8       0       54      0.01    99999   8

2nd file:
Code:
F1      F2       F3              F4             F5      F6      F7      F8      F9      F10     F11     
150546  99999   DFKMDNBL        MFDNDVHFD       25      MH      2       2       90260   258794  1296                    
152602  99999   GFMMBDFD        DFGDGDBGB       65      RF      3       6       30268   259761  907                     
160940  99999   DFGHDGTH        BBVCSDRRG       98      WD      5       5       65923   244552  720                     
165230  99999   HHDDHRTT        GTTHDTGBH       32      AS      4       6       25430   246695  1265                    
170265  99999   RTVDVRRE        EEWFCSDFF       65      CD      9       5       26980   265986  1069                    
180193  99999   VVDBFYHK        NCMKSOSUF       25      YG      1       8       65971   245695  1089                    
184021  99999   DVGNWEPE        POSUGBNCB       98      FF      7       3       15482   256589  1315            
189750  99999   DFGGGHPL        FJFFDKSJSQ      65      DR      5       3       45681   236659  1329            
999999  13721   FREREGHH        CVFKCJUPK       35      PW      2       3       54261   210546  1122                    
999999  15264   GTUKPBCS        HGFJFJZASS      14      PK      2       5       22976   236598  1225

I'd like to have this:
Code:
F1      F2      F3      F4      F5      F6      F7      F8      F9      F10     F11     F12     F13     F14     F15     F16
65971   245695  180193  99999   15960   1       18      19      3       16      11      0       54      0.01    99999   99999   
65971   245695  180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7       
54261   210546  999999  13721   12548   1       20      9       -12     7       -1      0       66      0       0       7       
65971   245695  180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7       
22976   236598  999999  15264   11446   1       22      10      -9      9       1       0       64      99999   99999   99999   
26980   265986  170265  99999   15960   1       26      17      3       12      8       0       54      0.01    99999   8

Thanks in advance.
# 2  
Old 11-17-2010
Here you go

(With Fn headings in input files and output)
Code:
awk 'NR == FNR { A[$1"|"$2]=$9; B[$1"|"$2]=$10; OFS="\t" ; next}
 FNR == 1 { printf "F1\tF2\tF3\tF4\tF5\tF6\tF7\tF8\tF9\tF10\tF11\tF12\tF13\tF14\tF15\tF16\n" }
 FNR > 1 { print A[$1"|"$2], B[$1"|"$2], $0; } ' fileB fileA

(Without Fn headings)
Code:
awk 'NR == FNR { A[$1"|"$2]=$9; B[$1"|"$2]=$10; OFS="\t" ; next}
 { print A[$1"|"$2], B[$1"|"$2], $0; } ' fileB fileA

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 11-17-2010
@Chubler_XL

Your output is right, because FileA (1st file) 's $1 and $2 are all in FileB (2nd file). If there are some different, you still print without $9,$10.

I change one number in 1st file.

Code:
F1      F2      F3      F4      F5      F6      F7      F8      F9      F10     F11     F12     F13     F14     F15     F16
65971   245695  180193  99999   15960   1       18      19      3       16      11      0       54      0.01    99999   99999
                180191  99999   51255   1       19      16      3       5       10      0       55      0       0       7
54261   210546  999999  13721   12548   1       20      9       -12     7       -1      0       66      0       0       7
65971   245695  180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
22976   236598  999999  15264   11446   1       22      10      -9      9       1       0       64      99999   99999   99999
26980   265986  170265  99999   15960   1       26      17      3       12      8       0       54      0.01    99999   8

Here is my code with headings.

Code:
awk 'NR==FNR {A[$1 FS $2]=$9 OFS $10; next}
FNR==1 {for (i=1;i<=NF+2;i++) printf "F"i OFS ;printf ORS}
FNR >1&&($1 FS $2 in A) { print A[$1 FS $2], $0; } '  OFS="\t"  FileB FileA

This User Gave Thanks to rdcwayx For This Post:
# 4  
Old 11-17-2010
Good point, but in my defense the original request wasn't specific about what should be output when a match wasn't found. It's probably slightly safer to put blank values for 9 and 10 rather than hiding the whole row.

BTW, I also considered a for loop to print the headings, but I suspect the real data has no headings or data specific headings that the OP renamed for ease of reference.

Great idea to use 9 OFS 10 as the hash array data.
# 5  
Old 11-17-2010
Chubler_XL and rdcwayx ,
I can't thank you enough for your time and help.

My data has no headings and I just put them to make my code clear.
I tried
Code:
awk 'NR == FNR { A[$1"|"$2]=$9; B[$1"|"$2]=$10; OFS="\t" ; next}
 { print A[$1"|"$2], B[$1"|"$2], $0; } ' fileB fileA

the fields that added to my fileA aren't correct. Something must be wrong with it. I test your codes with headings and let you know if it works.
Thanks again!
# 6  
Old 11-17-2010
Be careful you put the files on the awk command line in the correct order (FileB = your 2nd file) (fielA = your 1st file).

Here is a transcript of my test (with slight enhancement - Use FS instead of "|" in array index - thanks rdcwayx).
I kept the two arrays as this ensures the output has the same number of fields when lookup of fields1 and 2 don't find any data.
Code:
$ cat fileA
180193  99999   15960   1       18      19      3       16      11      0       54      0.01    99999   99999
180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
999999  13721   12548   1       20      9       -12     7       -1      0       66      0       0       7
180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
999999  15264   11446   1       22      10      -9      9       1       0       64      99999   99999   99999
170265  99999   15960   1       26      17      3       12      8       0       54      0.01    99999   8
$ cat fileB
150546  99999   DFKMDNBL        MFDNDVHFD       25      MH      2       2       90260   258794  1296                    
152602  99999   GFMMBDFD        DFGDGDBGB       65      RF      3       6       30268   259761  907                     
160940  99999   DFGHDGTH        BBVCSDRRG       98      WD      5       5       65923   244552  720                     
165230  99999   HHDDHRTT        GTTHDTGBH       32      AS      4       6       25430   246695  1265                    
170265  99999   RTVDVRRE        EEWFCSDFF       65      CD      9       5       26980   265986  1069                    
180193  99999   VVDBFYHK        NCMKSOSUF       25      YG      1       8       65971   245695  1089                    
184021  99999   DVGNWEPE        POSUGBNCB       98      FF      7       3       15482   256589  1315            
189750  99999   DFGGGHPL        FJFFDKSJSQ      65      DR      5       3       45681   236659  1329            
999999  13721   FREREGHH        CVFKCJUPK       35      PW      2       3       54261   210546  1122                    
999999  15264   GTUKPBCS        HGFJFJZASS      14      PK      2       5       22976   236598  1225
$ awk 'NR == FNR { A[$1 FS $2]=$9; B[$1 FS $2]=$10; OFS="\t" ; next}
   { print A[$1 FS $2], B[$1 FS $2], $0; } ' fileB fileA
65971   245695  180193  99999   15960   1       18      19      3       16      11      0       54      0.01    99999   99999
65971   245695  180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
54261   210546  999999  13721   12548   1       20      9       -12     7       -1      0       66      0       0       7
65971   245695  180193  99999   51255   1       19      16      3       5       10      0       55      0       0       7
22976   236598  999999  15264   11446   1       22      10      -9      9       1       0       64      99999   99999   99999
26980   265986  170265  99999   15960   1       26      17      3       12      8       0       54      0.01    99999


Last edited by Chubler_XL; 11-17-2010 at 11:32 PM..
# 7  
Old 11-17-2010
Quote:
Originally Posted by GoldenFire
Chubler_XL and rdcwayx ,
I can't thank you enough for your time and help.

My data has no headings and I just put them to make my code clear.
If there is no headings, it will be more simple:

Code:
awk 'NR==FNR {A[$1 FS $2]=$9 OFS $10; next}
$1 FS $2 in A { print A[$1 FS $2], $0; } '  OFS="\t"  FileB FileA

This User Gave Thanks to rdcwayx For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two files by two matching fields

Long time listener first time poster. Hope someone can advise. I have two files, 1000+ lines in each, two fields in each file. After performing a sort, what is the best way to find exact matches where field $1 and $2 in file1 are also present in file2 on the same line, then output only those... (6 Replies)
Discussion started by: bstaff
6 Replies

2. UNIX for Advanced & Expert Users

Need urgent help in comparing two fields in two files

Hi all, I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in... (1 Reply)
Discussion started by: sivarajb
1 Replies

3. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

4. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

5. Shell Programming and Scripting

comparing two files for matching fields

I am newbie to unix and would please like some help to solve the task below I have two files, file_a.text and file_b.text that I want to evaluate. file_a.text 1698.74 1711.88 6576.25 899.41 3205.63 4187.98 697.35 1551.83 ... (3 Replies)
Discussion started by: gameli
3 Replies

6. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

7. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

8. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

9. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

10. Shell Programming and Scripting

Merging two files by comparing three fields

Hi Experts, I need your timely help. I have a problem with merging two files. Here my situation : Here I have to compare first three fields from FILE1 with FILE2. If they are equal, I have to append the remaining values from FILE2 with FILE1 to create the output. FILE1: Class ... (3 Replies)
Discussion started by: Hunter85
3 Replies
Login or Register to Ask a Question