awk to comppare two files using rwo fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to comppare two files using rwo fields
# 1  
Old 09-30-2016
awk to comppare two files using rwo fields

I use the two awk scripts for comparing file1 and file2.


First awk compare $3 column:

Code:
awk -v OFS="\t" 'NR==FNR{a[$3]=$4;next}{$2=$2 "\t"(a[$2]?a[$2]:"-")}1' file1 file2

Second awk compare $2 column:

Code:
awk -v OFS="\t" 'NR==FNR{a[$2]=$4;next}{$2=$2 "\t"(a[$2]?a[$2]:"-")}1' file1 file2

The only difference

NR==FNR{a[$3]=$4;next} and NR==FNR{a[$2]=$4;next}


Basically I just want to combine those two and compare that to file two. Thank you Smilie.

file1
Code:
chr1    11796320    11796321    MTHFR 
chr1    169549810   169549811   F5 
chr1    173917077   173917078   SERPINC1 
chr2    48962781    48962782    FSHR 
chr4    121696961   121696962   ANXA5 
chr4    121697010   121697011   ANXA5 
chr4    121697036   121697037   ANXA5 
chr4    121697055   121697056   ANXA5 
chr11   46739504    46739505    F2 c
hr13   20189510    20189511    GJB2 
chr13   20189546    20189547    GJB2

file2
Code:
chr1    11796321    G   0   WILD    ADP=1026 
chr1    169549811   C   0   WILD    ADP=940 
chr1    173917078   C   0   WILD    ADP=501 
chr2    48962782    C   T   HET ADP=1665 
chr4    121696962   C   T   HET ADP=212 
chr4    121697011   A   0   WILD    ADP=184 
chr4    121697037   T   0   WILD    ADP=111 
chr4    121697037   tccc    0   INDEL   AINDEL 
chr4    121697056   C   0   WILD    ADP=112 
chr11   46739505    G   0   WILD    ADP=202 
chr13   20189511    C   0   WILD    ADP=326 
chr13   20189546    AC  A   INDEL   ADP=164 
chr13   20189547    C   0   WILD    ADP=3

desired output
Code:
chr1    11796321    MTHFR   G   0   WILD    ADP=1026
chr1    169549811   F5  C   0   WILD    ADP=940
chr1    173917078   SERPINC1    C   0   WILD    ADP=501
chr2    48962782    FSHR    C   T   HET ADP=1665
chr4    121696962   ANXA5   C   T   HET ADP=212
chr4    121697011   ANXA5   A   0   WILD    ADP=184
chr4    121697037   ANXA5   T   0   WILD    ADP=111
chr4    121697037   ANXA5   tccc    0   INDEL   AINDEL
chr4    121697056   ANXA5   C   0   WILD    ADP=112
chr11   46739505    F2  G   0   WILD    ADP=202
chr13   20189511    GJB2    C   0   WILD    ADP=326
chr13   20189546    GJB2    AC  A   INDEL   ADP=164
chr13   20189547    GJB2    C   0   WILD    ADP=3

Tried awk

Code:
awk -F'\t' 'NR==FNR{c[$2$3]++;next};c[$2] > 0' file1 file2

Moderator's Comments:
Mod Comment Please edit post to correct the two input files!

Last edited by cmccabe; 09-30-2016 at 12:12 PM.. Reason: fixed file format
# 2  
Old 09-30-2016
Compare?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-30-2016
Please explain your awk attempt.
This User Gave Thanks to RudiC For This Post:
# 4  
Old 09-30-2016
I corrected the input and by compare I mean ls look in file2 for the fields from file1. The awk was an attempt at that. However it seems to only look at one file. Thank you Smilie.
# 5  
Old 09-30-2016
Quote:
Originally Posted by cmccabe
I corrected the input and by compare I mean ls look in file2 for the fields from file1. The awk was an attempt at that. However it seems to only look at one file. Thank you Smilie.
Hello cmccabe,

could you please answer to RudiC's question, along with that I would like to ask you one question too. I see in output as follows.
Code:
hr13   20189510    20189511    GJB2 
chr13   20189546    AC  A   INDEL   ADP=164

In first line first field is hr13 not chr13 so is it a typo? In second line(mentioned above) I didn't see thrid field as 20189546 so in all the time only 3rd field was getting compared and last line 2nd field is getting compared, could you please let us know on these 2 points too.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 09-30-2016
Yes that was a typo and should be chr13. I am not sure what you are asking in question two but 20189546 could be in a different position and so in that case only one field is being compared, as you pointed out. Thank you Smilie.

I think I need something more like:
Code:
awk 'NR==FNR{f2[$2]=f3[$3]=$4;next}{k=$2; suf=((k in f2)?f2[k]:((k in f3)?f3[k]:"-"));$2=k "\t" suf}1' file{1,2}


Last edited by cmccabe; 09-30-2016 at 02:57 PM.. Reason: added details
# 7  
Old 09-30-2016
Hello cmccabe,

Could you please try following and let me know if this helps.
Code:
awk 'FNR==NR{A[$3]=$1 FS $3 FS $4;next} ($2 in A){print A[$2],$3,$4,$5,$6}'  Input_file1   Input_file2

Output will be as follows.
Code:
chr1 11796321 MTHFR G 0 WILD ADP=1026
chr1 169549811 F5 C 0 WILD ADP=940
chr1 173917078 SERPINC1 C 0 WILD ADP=501
chr2 48962782 FSHR C T HET ADP=1665
chr4 121696962 ANXA5 C T HET ADP=212
chr4 121697011 ANXA5 A 0 WILD ADP=184
chr4 121697037 ANXA5 T 0 WILD ADP=111
chr4 121697037 ANXA5 tccc 0 INDEL AINDEL
chr4 121697056 ANXA5 C 0 WILD ADP=112
chr11 46739505 F2 G 0 WILD ADP=202
chr13 20189511 GJB2 C 0 WILD ADP=326
chr13 20189547 GJB2 C 0 WILD ADP=3

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk match two fields in two files

Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz. It works fine, but when I apply it to large... (6 Replies)
Discussion started by: geomarine
6 Replies

2. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

3. Shell Programming and Scripting

Sum fields of different files using awk

I'm trying to sum each field of the second column over many different files. For example: file1: file2: 1 5 1 5 2 6 2 4 3 5 3 3 To get: file3 1 10 2 10 3 8 I found answer when there are only 2 files as... (10 Replies)
Discussion started by: rogeriog.em
10 Replies

4. Shell Programming and Scripting

awk program to join 2 fields of different files

Hello Friends, I just need a small help, I need an awk program which can join 2 fields of different files which are having one common field into one file. File - 1 FileName~Size File- 2 FileName~Date I need the output file in the following way O/P- File FileName~Date~Size For... (4 Replies)
Discussion started by: abhisheksunkari
4 Replies

5. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

6. Shell Programming and Scripting

AWK: merge two files and replace some fields

Need some code tweak: awk 'END { for (i=1; i<=n; i++) if (f2]) print f2] } NR == FNR { f2 = $1] = $0 next } $1 in f2 { delete f2 }1' FS=, OFS=, 2.csv 1.csv > 3.csvfile 1.csv have: $1,$2,$3,$4,$5,$6,$7,$8,$9...... file 2.csv have: $1,$2,$3,$4,$5,$6 (2 Replies)
Discussion started by: u10
2 Replies

7. Shell Programming and Scripting

AWK Compare files, different fields, output

Hi All, Looking for a quick AWK script to output some differences between two files. FILE1 device1 1.1.1.1 PINGS device1 2.2.2.2 PINGS FILE2 2862 SITE1 device1-prod 1.1.1.1 icmp - 0 ... (4 Replies)
Discussion started by: stacky69
4 Replies

8. Shell Programming and Scripting

Compare fields in 2 files using AWK

Hi unix gurus, I have a urgent requirement, I need to write a AWK script to compare each fields in 2 files using AWK. Basically my output should be like this. file1 row|num1|num2|num3 1|one|two|three 2|one|two|three file2 row|num1|num2|num3 1|one|two|three 2|one|two|four ... (5 Replies)
Discussion started by: rashmisb
5 Replies

9. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Hello! I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Discussion started by: Michelangelo
5 Replies

10. Shell Programming and Scripting

awk print fields to multiple files?

I am trying to print the output of a command to two separate files. Is it possible to use awk to print $1 to one file and $2 to another file? Thanks in advance! (1 Reply)
Discussion started by: TheCrunge
1 Replies
Login or Register to Ask a Question