Match two columns from two files and print output


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match two columns from two files and print output
# 1  
Old 07-04-2013
Question Match two columns from two files and print output

Hello,
I have two files which are of the following format
Code:
File 1 which has two columns

Protein_ID         Substitution
NP_997239         T53R 
NP_060668         V267M                   
NP_058515         P856A                   
NP_001206         T55M
NP_006601         D371Y                   
NP_065831         A851V           
...

File 2 which has 7 columns
Code:
Protein_ID Substitution dbSNP Prediction Score Median_Info Number_of_Seqs
NP_997239 T53R rs4274008   TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_005017 S312C rs61755420 TOLERATED 0.05 1.87 68                  
NP_005017 A697V rs1281018  TOLERATED 0.21 1.99 67                  
NP_060668 V533D rs1281016  TOLERATED 0.63 2.04 72

I need the output by matching the first two columns from both files (Protein_ID Substitution)
Code:
NP_997239 T53R  rs4274008  TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_006601 D371Y NA         NA        NA   NA   NA
NP_065831 A851V NA         NA        NA   NA   NA

So far this is what I got:
Code:
awk 'FNR == NR { h[$1] = 1; next } h[$1]' file1 file2 > output

but this matches only the first column and does not includes the non matching values.....
Thank you

Moderator's Comments:
Mod Comment Use code tags, see PM.

Last edited by nans; 07-04-2013 at 11:12 AM.. Reason: code tags, see PM
# 2  
Old 07-04-2013
Quote:
Originally Posted by nans
Hello,
I have two files which are of the following format
File 1 which has two columns

Protein_ID Substitution
NP_997239 T53R
NP_060668 V267M
NP_058515 P856A
NP_001206 T55M
NP_006601 D371Y
NP_065831 A851V
...

File 2 which has 7 columns
Protein_ID Substitution dbSNP Prediction Score Median_Info Number_of_Seqs
NP_997239 T53R rs4274008 TOLERATED 1 2.84 15
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75
NP_058515 P856A rs228697 TOLERATED 0.36 2.17 37
NP_001206 T55M rs2274327 TOLERATED 0.1 1.76 97
NP_005017 S312C rs61755420 TOLERATED 0.05 1.87 68
NP_005017 A697V rs1281018 TOLERATED 0.21 1.99 67
NP_060668 V533D rs1281016 TOLERATED 0.63 2.04 72

I need the output by matching the first two columns from both files (Protein_ID Substitution)
NP_997239 T53R rs4274008 TOLERATED 1 2.84 15
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75
NP_058515 P856A rs228697 TOLERATED 0.36 2.17 37
NP_001206 T55M rs2274327 TOLERATED 0.1 1.76 97
NP_006601 D371Y NA NA NA NA NA
NP_065831 A851V NA NA NA NA NA

So far this is what I got:
awk 'FNR == NR { h[$1] = 1; next } h[$1]' file1 file2 > output
but this matches only the first column and does not includes the non matching values.....
Thank you
Please use code tags for code and data sample.

Code:
 awk 'FNR == NR { h[$1,$2] = 1; next } h[$1,$2]' file1 file2 > output

This User Gave Thanks to pamu For This Post:
# 3  
Old 07-04-2013
Starting off with and extending pamu's proposal:
Code:
awk 'FNR == NR { h[$1,$2] = $0; next } h[$1,$2] {print h[$1,$2];next} {print $1,$2,"N/A","N/A"}' file2 file1 
Protein_ID Substitution dbSNP Prediction Score Median_Info Number_of_Seqs
NP_997239 T53R rs4274008   TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_006601 D371Y N/A N/A
NP_065831 A851V N/A N/A

You may have to add some "N/A"s...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. UNIX for Dummies Questions & Answers

Match the columns between two files and output

Hi Help, I have two files namely a.txt and b.txt a.txt looks like a.txt 1 2 2 1 3 3 2 4 4 4 5 6 6 7 7 b.txt looks like, b.txt 1 2 1 1 3 2 2 4 3 3 4 4 4 5 5 (2 Replies)
Discussion started by: Indra2011
2 Replies

3. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

4. Shell Programming and Scripting

Match columns and print specific field

Hello, I have data in following format. ... (6 Replies)
Discussion started by: Pushpraj
6 Replies

5. UNIX for Advanced & Expert Users

Match and print based on columns

HI, I have 2 different questions in this thread. Consider 2 files as input (input file have different line count ) File 1 1 1 625 56 1 12 657 34 1 9 25 45 1 2 20 54 67 3 25 35 27 4 45 73 36 5 125 56 45 File2 1 1 878 76 1 9 83 67 2 20 73 78 4 47 22 17 3 25 67 99 (4 Replies)
Discussion started by: rossi
4 Replies

6. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

7. Shell Programming and Scripting

Print output as columns

Normal grep is not working to get the output. Sample Input: newjob: abc command name: a=b+c newjob: bbc command name: c=r+v newjob:ddc newjob:kkc command name: c=l+g newjob:mdc newjob:ldc newjob:kjc command name: u=dl+g newjob:lkdc newjob:lksdc command name: o=udl+g (6 Replies)
Discussion started by: onesuri
6 Replies

8. Shell Programming and Scripting

match two key columns in two files and print output (awk)

I have two files... file1 and file2. Where columns 1 and 2 of file1 match columns 1 and 2 of file2 I want to create a new file that is all file1 + columns 3 and 4 of file2 :b: Many thanks if you know how to do this.... :b: file1 31-101 106 0 92 31-101 106 29 ... (2 Replies)
Discussion started by: pelhabuan
2 Replies

9. Shell Programming and Scripting

Match and print columns in second file

Hi All, I have to match each row in file 1 with 1st row in file 2 and print the corresponding column from file2. I am trying to use an awk script to do this. For example cat File1 X1 X3 X4 cat File2 ID X1 X2 X3 X4 A 1 6 2 1 B 2 7 3 3 C 3 8 4 1 D 4 9 1 1 (3 Replies)
Discussion started by: newpro
3 Replies

10. Shell Programming and Scripting

compare columns from seven files and print the output

Hi guys, I need some help to come out with a solution . I have seven such files but I am showing only three for convenience. filea a5 20 a8 16 fileb a3 42 a7 14 filec a5 23 a3 07 The output file shoud contain the data in table form showing first field of... (7 Replies)
Discussion started by: smriti_shridhar
7 Replies
Login or Register to Ask a Question