Print header and lines that meet both conditions in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Print header and lines that meet both conditions in awk
# 1  
Old 08-23-2017
Print header and lines that meet both conditions in awk

In the awk below I am trying to print only the header lines starting with # or ## and the lines that $7 is PASS and AF= is less than 5%. The awk does execute but returns an empty file and I am not sure what I am doing wrong. Thank you.

file
Code:
##INFO=<ID=OPOS,Number=.,Type=Integer,Description="List of original allele positions">
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.178389
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	948846	.	T	TA	1121.35	PASS	AF=1;AO=190;DP=210;FAO=202;FDP=202;FR=.;FRO=0;FSAF=99;FSAR=103;FSRF=0;FSRR=0;FWDB=-0.0502834;FXX=0.0146334;HRUN=1;LEN=1;MLLD=22.6802;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=22.205;RBI=0.0912754;REFB=0.0676884;REVB=0.076176;RO=14;SAF=93;SAR=97;SRF=10;SRR=4;SSEN=0;SSEP=0;SSSB=-0.0292947;STB=0.5;STBP=1;TYPE=ins;VARB=3.8368e-05	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:221:210:202:14:0:190:202:1:97:93:10:4:103:99:0:0:1
chr1	948870	.	C	G	2477.57	PASS	AF=0.971326;AO=272;DP=280;FAO=271;FDP=279;FR=.,REALIGNEDx1;FRO=8;FSAF=130;FSAR=141;FSRF=8;FSRR=0;FWDB=0.0152465;FXX=0.0035713;HRUN=2;LEN=1;MLLD=68.4362;OALT=G;OID=.;OMAPALT=G;OPOS=948870;OREF=C;PB=.;PBP=.;QD=35.5207;RBI=0.02597;REFB=-0.033231;REVB=0.0210235;RO=7;SAF=131;SAR=141;SRF=7;SRR=0;SSEN=0;SSEP=0;SSSB=-0.0247914;STB=0.514924;STBP=0.003;TYPE=snp;VARB=0.00476007	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:61:280:279:7:8:272:271:0.971326:141:131:7:0:141:130:8:0:1
chr1	948921	.	T	C	3825.34	PASS	AF=1;AO=447;DP=448;FAO=399;FDP=399;FR=.;FRO=0;FSAF=195;FSAR=204;FSRF=0;FSRR=0;FWDB=0.00321295;FXX=0.00249994;HRUN=2;LEN=1;MLLD=105.43;OALT=C;OID=.;OMAPALT=C;OPOS=948921;OREF=T;PB=.;PBP=.;QD=38.3492;RBI=0.00709428;REFB=-0.0465188;REVB=0.00632502;RO=0;SAF=222;SAR=225;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=0;STB=0.5;STBP=1;TYPE=snp;VARB=6.65831e-05	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:171:448:399:0:0:447:399:1:225:222:0:0:204:195:0:0:1
chr1	949608	.	G	A	764.935	PASS	AF=0.4775;AO=419;DP=866;FAO=191;FDP=400;FR=.,REALIGNEDx0.48;FRO=209;FSAF=101;FSAR=90;FSRF=116;FSRR=93;FWDB=0.0542831;FXX=0;HRUN=1;LEN=1;MLLD=147.514;OALT=A;OID=.;OMAPALT=A;OPOS=949608;OREF=G;PB=.;PBP=.;QD=7.64935;RBI=0.0545875;REFB=-0.00710673;REVB=0.00575684;RO=446;SAF=229;SAR=190;SRF=254;SRR=192;SSEN=0;SSEP=0;SSSB=-0.0218564;STB=0.51377;STBP=0.582;TYPE=snp;VARB=0.00764515	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:764:866:400:446:209:419:191:0.4775:190:229:254:192:90:101:116:93:1
chr1	949654	.	A	G	3797.25	PASS	AF=0.03;AO=779;DP=781;FAO=398;FDP=398;FR=.;FRO=0;FSAF=207;FSAR=191;FSRF=0;FSRR=0;FWDB=-0.00488004;FXX=0.00499988;HRUN=1;LEN=1;MLLD=124.312;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=.;PBP=.;QD=38.1633;RBI=0.0260218;REFB=-0.0237472;REVB=-0.0255601;RO=2;SAF=425;SAR=354;SRF=1;SRR=1;SSEN=0;SSEP=0;SSSB=0.000224878;STB=0.5;STBP=1;TYPE=snp;VARB=-2.92274e-05	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:161:781:398:2:0:779:398:1:354:425:1:1:191:207:0:0:1

desired output
Code:
##INFO=<ID=OPOS,Number=.,Type=Integer,Description="List of original allele positions">
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.178389
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	949654	.	A	G	3797.25	PASS	AF=0.03;AO=779;DP=781;FAO=398;FDP=398;FR=.;FRO=0;FSAF=207;FSAR=191;FSRF=0;FSRR=0;FWDB=-0.00488004;FXX=0.00499988;HRUN=1;LEN=1;MLLD=124.312;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=.;PBP=.;QD=38.1633;RBI=0.0260218;REFB=-0.0237472;REVB=-0.0255601;RO=2;SAF=425;SAR=354;SRF=1;SRR=1;SSEN=0;SSEP=0;SSSB=0.000224878;STB=0.5;STBP=1;TYPE=snp;VARB=-2.92274e-05	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:161:781:398:2:0:779:398:1:354:425:1:1:191:207:0:0:1


awk
Code:
awk -F'\t' '$7 == "PASS" && /AF=(<[1-9]\.0[5-9])/' file > out

---------- Post updated at 08:13 AM ---------- Previous update was at 07:49 AM ----------

This awk is what I needed:

awk
Code:
awk -F'\t' '
  {
    split(x,V)
    for(i=1; i<=NF; i++) {
      split($i,F,/=/)
      V[F[1]]=F[2]
    }
  }
  (V["AF"]+0 < 0.05)
' file > out

Thanks @ Scrutinizer for your help Smilie.

Last edited by cmccabe; 08-23-2017 at 10:14 AM.. Reason: fixed format
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print lines that meet conditions and have value in another file

I am trying to use awk to print lines that satisfy either of the two conditions below: condition 1: $2 equals CNV and the split of $3, the value in red, is greater than or equal to 4. ---- this is a or so I think condition 2: $2 equals CNV and the split of $3, the value in red --- this is a... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

awk to capture lines that meet either condition

I am trying to modify and understand an awk written by @Scrutinizer The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI=,0.95: portion in blue in file is <1.9. The... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

awk to print matching lines in files that meet critera

In the tab delimited files below I am trying to match $2 in file1 to $2 of file2. If a match is found the awk checks $3 of file2 and if it is greater than 40% and $4 of file2 is greater than 49, the line in file1 is printed. In the desired output line3 of file1 is not printed because $3 off file2... (9 Replies)
Discussion started by: cmccabe
9 Replies

5. UNIX for Dummies Questions & Answers

Print lines meet requirement

Dear Masters, I have 2 files input below file1 8269229289|CROATIA|LUX 8269229412|ASIA|LUX 8269229371|EUROPE|LUX 8269229355|LANE|LUX 8269229469|SWISS|LUX 8269229477|HAMBURG|LUX 8269229484|EGYPT|LUX 8269229485|GERMANY|LUX 8269229498|CROATIA|LUX File2 8269229289|1100100020... (6 Replies)
Discussion started by: radius
6 Replies

6. Shell Programming and Scripting

Print header and some lines

HI , i have to print the first header of df -h (Filesystem Size Used Avail Use% Mounted on)and line which conatin size Network path only. Filesystem Size Used Avail Use% Mounted on /test/sda3 35G 1.8G 32G 6% / /test/sda10 7.8G 1.1G ... (3 Replies)
Discussion started by: netdbaind
3 Replies

7. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

8. Shell Programming and Scripting

Need awk help to print specific columns with as string in a header

awk experts, I have a big file of 4000 columns with header. Would like to print the columns with string value of "Commands" in header. File has "," separator. This file is on ESX host with Bash. Thanks, Arv (21 Replies)
Discussion started by: arv_cds
21 Replies

9. Shell Programming and Scripting

print lines except the header

awk -F ";" '{if($10>80 && NR>1) print $0 }' txt_file_* I am using this command to print the lines which has 10th field more then 80 and leaving the first line of the file which is the header. But this is not working , the first line is is coming as output , please correct me . thanks (2 Replies)
Discussion started by: madfox
2 Replies

10. Shell Programming and Scripting

Perl - if conditions is meet, push the last field of $_ into an array

I am using a seed file shown below to separate cisco devices by ios/os type. I want to bunch all the devices based on ios/os version. Once I find a match, I only want to push the ip address into the appropriate array. Example of seedfile 8 host1 (C3500XL-C3H2S-M) 11.0(5)WC17 10.1.44.21 9... (1 Reply)
Discussion started by: popeye
1 Replies
Login or Register to Ask a Question