awk to comppare two files using rwo fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to comppare two files using rwo fields
# 8  
Old 09-30-2016
Thank you, works great Smilie.
# 9  
Old 09-30-2016
Hi,

Can you try this and let me know how it goes?
Code:
$ awk 'NR==FNR{a[$3]=$4;next} { if ($2 in a)  $2=$2 "\t" a[$2]; } 1' OFS="\t" f1 f2

Got output as ( as per your input files ) :
Quote:
chr1 11796321 MTHFR G 0 WILD ADP=1026
chr1 169549811 F5 C 0 WILD ADP=940
chr1 173917078 SERPINC1 C 0 WILD ADP=501
chr2 48962782 FSHR C T HET ADP=1665
chr4 121696962 ANXA5 C T HET ADP=212
chr4 121697011 ANXA5 A 0 WILD ADP=184
chr4 121697037 ANXA5 T 0 WILD ADP=111
chr4 121697037 ANXA5 tccc 0 INDEL AINDEL
chr4 121697056 ANXA5 C 0 WILD ADP=112
chr11 46739505 F2 G 0 WILD ADP=202
chr13 20189511 GJB2 C 0 WILD ADP=326penulitmate
chr13 20189546 AC A INDEL ADP=164
chr13 20189547 GJB2 C 0 WILD ADP=3
Not sure , if i get your requirement on penultimate line . Can you clarify a bit ?
This User Gave Thanks to greet_sed For This Post:
# 10  
Old 09-30-2016
I still don't understand what you're trying to do.

When you compare files, what determines when there is a match? When there is a match what is supposed to happen? What should happen if there are two matches?

And most lines in your sample file1 have four fields; but the line:
Code:
chr11   46739504    46739505    F2 c

has five fields. What is supposed to be done with the extra field???
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 09-30-2016
How about
Code:
awk -v OFS="\t" 'NR==FNR{a[$3]=a[$2]=$4; next}{$2=$2 "\t"(a[$2]?a[$2]:"-")}1' file1 file2
chr1    11796321    MTHFR    G    0    WILD    ADP=1026
chr1    169549811    F5    C    0    WILD    ADP=940
chr1    173917078    SERPINC1    C    0    WILD    ADP=501
chr2    48962782    FSHR    C    T    HET    ADP=1665
chr4    121696962    ANXA5    C    T    HET    ADP=212
chr4    121697011    ANXA5    A    0    WILD    ADP=184
chr4    121697037    ANXA5    T    0    WILD    ADP=111
chr4    121697037    ANXA5    tccc    0    INDEL    AINDEL
chr4    121697056    ANXA5    C    0    WILD    ADP=112
chr11    46739505    F2    G    0    WILD    ADP=202
chr13    20189511    GJB2    C    0    WILD    ADP=326
chr13    20189546    GJB2    A    A    INDEL    ADP=164
chr13    20189547    GJB2    C    0    WILD    ADP=3

This User Gave Thanks to RudiC For This Post:
# 12  
Old 09-30-2016
Thank you all Smilie.

A match is determined by $2 and $3 in file1 = $4 in file2 if they match the line is printed and the penultimate is only printed. Thanks again Smilie

Last edited by cmccabe; 09-30-2016 at 04:07 PM.. Reason: added details
# 13  
Old 09-30-2016
Quote:
Originally Posted by cmccabe
Thank you all Smilie.

A match is determined by $2 and $3 in file1 = $4 in file2 if they match the line is printed and the penultimate is only printed. Thanks again Smilie
I'm glad that what you have is working for you, but the code you have been given does not match the requirements you have stated.

There is never a case in your sample input where $2 in file1 matches $4 in file2 AND $3 in file1 matches $4 in file2 (because there is never a case in your sample input where $2 in file1 matches $3 in file1) AND there is never a case where $2 or $3 in file1 matches $4 in file2???

The code that you have been given adds $4 from file1 as a new field between fields 2 and 3 in file2 if $2 in file1 matches $2 in file2 OR if $3 in file1 matches $2 in file2. Looking at your sample data, it appears that on all of the lines that are matched there are two fields that are matching ($1 in file1 matches $1 in file2 and either $2 or $3 in file1 matches $2 in file2), but none of the suggestions that have been made so far looks at $1 in either file (except when printing the output when $2 or $3 in file1 matches $2 in file2).

If that is what you are trying to do AND if the two lines in your sample file1:
Code:
chr11   46739504    46739505    F2 c
hr13   20189510    20189511    GJB2

were intended to be:
Code:
chr11   46739504    46739505    F2
chr13   20189510    20189511    GJB2

instead, then the following code might be a more accurate method of doing what you want:
Code:
awk '
BEGIN {	OFS = "\t"
}
NR == FNR {
	d[$1,$2] = d[$1,$3] = $4
	next
}
{	$2 = $2 OFS ((($1,$2) in d) ? d[$1,$2] : "-")
}
1' file1 file2

which, with your sample file1 with the above correction and your sample file2, produces the output:
Code:
chr1	11796321	MTHFR	G	0	WILD	ADP=1026
chr1	169549811	F5	C	0	WILD	ADP=940
chr1	173917078	SERPINC1	C	0	WILD	ADP=501
chr2	48962782	FSHR	C	T	HET	ADP=1665
chr4	121696962	ANXA5	C	T	HET	ADP=212
chr4	121697011	ANXA5	A	0	WILD	ADP=184
chr4	121697037	ANXA5	T	0	WILD	ADP=111
chr4	121697037	ANXA5	tccc	0	INDEL	AINDEL
chr4	121697056	ANXA5	C	0	WILD	ADP=112
chr11	46739505	F2	G	0	WILD	ADP=202
chr13	20189511	GJB2	C	0	WILD	ADP=326
chr13	20189546	GJB2	AC	A	INDEL	ADP=164
chr13	20189547	GJB2	C	0	WILD	ADP=3

I hope this helps.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk match two fields in two files

Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz. It works fine, but when I apply it to large... (6 Replies)
Discussion started by: geomarine
6 Replies

2. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

3. Shell Programming and Scripting

Sum fields of different files using awk

I'm trying to sum each field of the second column over many different files. For example: file1: file2: 1 5 1 5 2 6 2 4 3 5 3 3 To get: file3 1 10 2 10 3 8 I found answer when there are only 2 files as... (10 Replies)
Discussion started by: rogeriog.em
10 Replies

4. Shell Programming and Scripting

awk program to join 2 fields of different files

Hello Friends, I just need a small help, I need an awk program which can join 2 fields of different files which are having one common field into one file. File - 1 FileName~Size File- 2 FileName~Date I need the output file in the following way O/P- File FileName~Date~Size For... (4 Replies)
Discussion started by: abhisheksunkari
4 Replies

5. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

6. Shell Programming and Scripting

AWK: merge two files and replace some fields

Need some code tweak: awk 'END { for (i=1; i<=n; i++) if (f2]) print f2] } NR == FNR { f2 = $1] = $0 next } $1 in f2 { delete f2 }1' FS=, OFS=, 2.csv 1.csv > 3.csvfile 1.csv have: $1,$2,$3,$4,$5,$6,$7,$8,$9...... file 2.csv have: $1,$2,$3,$4,$5,$6 (2 Replies)
Discussion started by: u10
2 Replies

7. Shell Programming and Scripting

AWK Compare files, different fields, output

Hi All, Looking for a quick AWK script to output some differences between two files. FILE1 device1 1.1.1.1 PINGS device1 2.2.2.2 PINGS FILE2 2862 SITE1 device1-prod 1.1.1.1 icmp - 0 ... (4 Replies)
Discussion started by: stacky69
4 Replies

8. Shell Programming and Scripting

Compare fields in 2 files using AWK

Hi unix gurus, I have a urgent requirement, I need to write a AWK script to compare each fields in 2 files using AWK. Basically my output should be like this. file1 row|num1|num2|num3 1|one|two|three 2|one|two|three file2 row|num1|num2|num3 1|one|two|three 2|one|two|four ... (5 Replies)
Discussion started by: rashmisb
5 Replies

9. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Hello! I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Discussion started by: Michelangelo
5 Replies

10. Shell Programming and Scripting

awk print fields to multiple files?

I am trying to print the output of a command to two separate files. Is it possible to use awk to print $1 to one file and $2 to another file? Thanks in advance! (1 Reply)
Discussion started by: TheCrunge
1 Replies
Login or Register to Ask a Question