awk to print match or non-match and select fields/patterns for non-matches

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to print match or non-match and select fields/patterns for non-matches
# 1  
Old 08-03-2017
awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing in whatever file. That seems to work, but what I can't seem to do in print the values for the patterns in bold for only the missing lines. The patterns are always in the second line so I set it to FNR==2 and I am also trying to print $6 in FNR==1 in my attempt to capture these is in bold in the awk.

chr1	948846	.	T	TA	749.023	PASS	AF=1;AO=127;DP=135;FAO=132;FDP=132;FR=.;FRO=0;FSAF=79;FSAR=53;FSRF=0;FSRR=0;FWDB=-0.0360682;FXX=0.0222206;HRUN=1;LEN=1;MLLD=26.1513;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=22.6977;RBI=0.0451251;REFB=0.0429344;REVB=0.0271174;RO=4;SAF=76;SAR=51;SRF=3;SRR=1;SSEN=0;SSEP=0;SSSB=-0.00916056;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00252488	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:150:135:132:4:0:127:132:1:51:76:3:1:53:79:0:0:1
chr1	948870	.	C	G	1696.85	PASS	AF=0.973822;AO=186;DP=191;FAO=186;FDP=191;FR=.,REALIGNEDx0.9895;FRO=5;FSAF=110;FSAR=76;FSRF=3;FSRR=2;FWDB=0.0127239;FXX=0;HRUN=2;LEN=1;MLLD=64.7594;OALT=G;OID=.;OMAPALT=G;OPOS=948870;OREF=C;PB=.;PBP=.;QD=35.5361;RBI=0.0128295;REFB=-0.0404247;REVB=-0.00164304;RO=4;SAF=110;SAR=76;SRF=2;SRR=2;SSEN=0;SSEP=0;SSSB=0.00378587;STB=0.500233;STBP=0.994;TYPE=snp;VARB=0.00428411	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:46:191:191:4:5:186:186:0.973822:76:110:2:2:76:110:3:2:1
chr1	948921	.	T	C	3030.3	PASS	AF=1;AO=320;DP=322;FAO=320;FDP=320;FR=.;FRO=0;FSAF=182;FSAR=138;FSRF=0;FSRR=0;FWDB=-0.0161106;FXX=0.00621099;HRUN=2;LEN=1;MLLD=100.549;OALT=C;OID=.;OMAPALT=C;OPOS=948921;OREF=T;PB=.;PBP=.;QD=37.8788;RBI=0.0184392;REFB=-0.0214439;REVB=0.00896955;RO=1;SAF=182;SAR=138;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00261388;STB=0.5;STBP=1;TYPE=snp;VARB=8.95112e-05	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:127:322:320:1:0:320:320:1:138:182:1:0:138:182:0:0:1
chr1	2522204	.	G	T	24.0805	PASS	AF=0.666667;AO=4;DP=6;FAO=4;FDP=6;FR=.;FRO=2;FSAF=4;FSAR=0;FSRF=2;FSRR=0;FWDB=0.0759261;FXX=0;HRUN=3;LEN=1;MLLD=55.4428;OALT=T;OID=.;OMAPALT=T;OPOS=2522204;OREF=G;PB=.;PBP=.;QD=16.0537;RBI=0.0759261;REFB=0.0417034;REVB=0;RO=2;SAF=4;SAR=0;SRF=2;SRR=0;SSEN=0;SSEP=0;SSSB=0;STB=0.5;STBP=1;TYPE=snp;VARB=-0.0143595	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:10:6:6:2:2:4:4:0.666667:0:4:2:0:0:4:2:0:1

chr1	948846	.	T	TA	734.654	PASS	AF=0.965035;AO=132;DP=147;FAO=138;FDP=143;FDVR=0;FR=.;FRO=5;FSAF=77;FSAR=61;FSRF=3;FSRR=2;FWDB=-0.0273468;FXX=0.027209;HRUN=1;LEN=1;MLLD=23.6556;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.5498;RBI=0.0391463;REFB=0.0535759;REVB=0.0280105;RO=7;SAF=74;SAR=58;SRF=5;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0149339;STB=0.50149;STBP=0.898;TYPE=ins;VARB=-0.00206507	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	1/1:117:147:143:7:5:132:138:0.965035:58:74:5:2:61:77:3:2
chr1	948870	.	C	G	1748.84	PASS	AF=0.973545;AO=182;DP=188;FAO=184;FDP=189;FDVR=-1;FR=.,REALIGNEDx1;FRO=5;FSAF=98;FSAR=86;FSRF=5;FSRR=0;FWDB=0.00560489;FXX=0;HRUN=2;LEN=1;MLLD=62.9653;OALT=G;OID=.;OMAPALT=G;OPOS=948870;OREF=C;PB=.;PBP=.;QD=37.0124;RBI=0.0287713;REFB=0.0167989;REVB=0.02822;RO=4;SAF=96;SAR=86;SRF=4;SRR=0;SSEN=0;SSEP=0;SSSB=-0.0194202;STB=0.512436;STBP=0.042;TYPE=snp;VARB=-0.00274912	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	1/1:40:188:189:4:5:182:184:0.973545:86:96:4:0:86:98:5:0
chr1	948921	.	T	C	2717.47	PASS	AF=0.982639;AO=283;DP=288;FAO=283;FDP=288;FDVR=5;FR=.;FRO=5;FSAF=142;FSAR=141;FSRF=4;FSRR=1;FWDB=-0.00608685;FXX=0;HRUN=2;LEN=1;MLLD=83.445;OALT=C;OID=.;OMAPALT=C;OPOS=948921;OREF=T;PB=.;PBP=.;QD=37.7426;RBI=0.00625429;REFB=-0.0135323;REVB=0.0014375;RO=4;SAF=142;SAR=141;SRF=4;SRR=0;SSEN=0;SSEP=0;SSSB=-0.0132404;STB=0.505178;STBP=0.214;TYPE=snp;VARB=0.000730702	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	1/1:76:288:288:4:5:283:283:0.982639:141:142:4:0:141:142:4:1
chr1	985450	.	G	A	18.4743	PASS	AF=0.307692;AO=3;DP=12;FAO=4;FDP=13;FDVR=-1;FR=.;FRO=9;FSAF=2;FSAR=2;FSRF=6;FSRR=3;FWDB=-0.0226987;FXX=0;HRUN=15;LEN=1;MLLD=115.792;OALT=A;OID=.;OMAPALT=A;OPOS=985450;OREF=G;PB=.;PBP=.;QD=5.68439;RBI=0.0253049;REFB=0.0610951;REVB=-0.0111851;RO=9;SAF=2;SAR=1;SRF=6;SRR=3;SSEN=0;SSEP=0;SSSB=0;STB=0.614838;STBP=0.651;TYPE=snp;VARB=-0.00351513	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/1:18:12:13:9:9:3:4:0.307692:1:2:6:3:2:2:6:3

desired output
chr1 948846 T TA
chr1 948921 T C
chr1 948870 C G
Missing in file1:
chr1 985450 G A	18.4743 FR=.;HRUN=15;LEN=1;QD=5.68439;STB=0.614838
Missing in file2:
chr1 2522204 G T 24.0805 FR=.;HRUN=3;LEN=1;QD=16.0537;STB=0.5

awk '
     FNR == NR { file1[$1,$2,$4,$5] = $1 " " $2 " " $4 " " $5 } 
     FNR != NR { file2[$1,$2,$4,$5] = $1 " " $2 " " $4 " " $5 }
     END { print "Match:"; for (k in file1) if (k in file2) print file1[k] # Or file2[k]
           print "Missing in file1:"; for (k in file2) if (!(k in file1)) print file2[k] && FNR==1 $6 {print} && && FNR==2 /FR=/ && /HRUN=/ && /LEN=/ && /QD=/ && /STB=/ {print}
           print "Missing in file2:"; for (k in file1) if (!(k in file2)) print file1[k] && FNR==1 $6 {print} && && FNR==2 /FR=/ && /HRUN=/ && /LEN=/ && /QD=/ && /STB=/ {print}
 }' file1 file2

current output
chr1 948846 T TA
chr1 948921 T C
chr1 948870 C G
Missing in file1:
chr1 985450 G A
Missing in file2:
chr1 2522204 G T

Last edited by cmccabe; 08-03-2017 at 03:22 PM.. Reason: fixed format
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match output fields agains two patterns

I need to print field and the next one if field matches 'patternA' and also print 'patternB' fields. echo "some output" | awk '{for(i=1;i<=NF;i++){if($i ~ /patternA/){print $i, $(i+1)}elif($i ~ /patternB/){print $i}}}' This code returnes me 'syntax error'. Pls advise how to do properly. (2 Replies)
Discussion started by: urello
2 Replies

2. Shell Programming and Scripting

awk move select fields to match file prefix in two directories

In the awk below I am trying to use the file1 as a match to file2. In file2 the contents of $5,&6,and $7 (always tab-delimited) and are copied to the output under the header Quality metrics. The below executes but the output is empty. I have added comments to help and show my thinking. Thank you... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. UNIX for Beginners Questions & Answers

Match Fields between two files, print portions of each file together when matched in ([g]awk)'

I've written an awk script to compare two fields in two different files and then print portions of each file on the same line when matched. It works reasonably well, but every now and again, I notice some errors and cannot seem to figure out what the issue may be and am turning to you for help. ... (2 Replies)
Discussion started by: jvoot
2 Replies

4. Shell Programming and Scripting

Egrep patterns in a file and limit number of matches to print for each pattern match

Hi I need to egrep patterns in a file and limit number of matches to print for each matched pattern. -m10 option is not working out in my sun solaris 5.10 Please guide me the options to achieve. if i do head -10 , i wont be getting all pattern match results as output since for a... (10 Replies)
Discussion started by: ananan
10 Replies

5. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies

6. Shell Programming and Scripting

awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you :). file name 31 Index Chromosomal Position Gene Inheritance 122 2106725 TSC2 AD 124 2115481 TSC2 AD 121 2105400 TSC2 AD... (6 Replies)
Discussion started by: cmccabe
6 Replies

7. Shell Programming and Scripting

Match 2 different patterns and print the lines

Hi, i have been trying to extract multiple lines based on two different patterns as below:- file1 @jkm|kdo|aas012| blablbalablablkabblablabla sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf jahfjkhflkhalfdhfwearhahfl @jkm|sdf|wud08q| (8 Replies)
Discussion started by: redse171
8 Replies

8. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

9. Shell Programming and Scripting

Match multiple patterns in a file and then print their respective next line

Dear all, I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns : 1) # ATC_Codes: 2) # Generic_Name: 3) # Drug_Target_1_Gene_Name: #BEGIN_DRUGCARD DB00001 # AHFS_Codes:... (3 Replies)
Discussion started by: AshwaniSharma09
3 Replies

10. Shell Programming and Scripting

print lines which match multiple patterns

Hi, I have a text file as follows: 11:38:11.054 run1_rdseq avg_2-5 999988.0000 1024.0000 11:50:52.053 run3_rdrand 999988.0000 1135.0 128.0417 11:53:18.050 run4_wrrand avg_2-5 999988.0000 8180.5833 11:55:42.051 run4_wrrand avg_2-5 999988.0000 213.8333 11:55:06.053... (2 Replies)
Discussion started by: annazpereira
2 Replies
Login or Register to Ask a Question

Featured Tech Videos