awk to print line based on two keywords


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to print line based on two keywords
# 1  
Old 05-09-2017
awk to print line based on two keywords

I am starting to write a multi-line awk and using the file below which is
tab-delimited, print only the line with oncomineGeneClass
and oncomineVariantClass and PASS. The script execute but
seems to be printing the entire file, not the desired line. Thank you Smilie.

file
Code:
##FORMAT=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand">
##FORMAT=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    file
SVTYPE=Fusion;READ_COUNT=1868;GENE_NAME=ETV6;EXON_NUM=4;RPM=1.5825e-09;NORM_COUNT=0.001582480886121524;ANNOTATION=COSF823;FUNC=[{'gene':'ETV6','exon':'4','oncomineGeneClass':'Gain-of-Function','oncomineVariantClass':'Fusion'}]    GT:GQ    ./.:.
chr15    88483984    ETV6-NTRK3.E4N15.COSF823.1_2    T    ]chr12:12006495]T    .    PASS
SVTYPE=Fusion;READ_COUNT=1868;GENE_NAME=NTRK3;EXON_NUM=15;RPM=1.5825e-09;NORM_COUNT=0.001582480886121524;ANNOTATION=COSF823;FUNC=[{'gene':'NTRK3','exon':'15','oncomineGeneClass':'Gain-of-Function','oncomineVariantClass':'Fusion'}]    GT:GQ    ./.:.
chr12    12022903    ETV6-NTRK3.E5N14_1    G    G]chr15:88576276]    .    FAIL
chr17    7577108    COSM10749;COSM43737    C    A,T    149.594    PASS    AF=0.0830415,0.0;AO=372,2;DP=4420;FAO=166,0;FDP=1999;FR=.,.,REALIGNEDx0.0865;FRO=1833;FSAF=82,0;FSAR=84,0;FSRF=952;FSRR=881;FWDB=0.0072184,-0.0207142;FXX=4.99998E-4;HRUN=1,1;LEN=1,1;MLLD=293.795,80.5366;OALT=A,T;OID=COSM10749,COSM43737;OMAPALT=A,T;OPOS=7577108,7577108;OREF=C,C;PB=.,.;PBP=.,.;QD=0.299338;RBI=0.00721997,0.02565;REFB=1.40155E-4,-7.81395E-4;REVB=1.50579E-4,0.0151276;RO=4043;SAF=187,1;SAR=185,1;SRF=2118;SRR=1925;SSEN=0,0;SSEP=0,0;SSSB=-0.0251826,-5.12306E-4;STB=0.52327,0.5;STBP=0.541,1.0;TYPE=snp,snp;VARB=-0.00153404,0.0;HS;FUNC=[{'origPos':'7577108','origRef':'C','normalizedRef':'C','gene':'TP53','normalizedPos':'7577108','normalizedAlt':'A','polyphen':'1.0','gt':'pos','codon':'TTT','coding':'c.830G>T','sift':'0.0','grantham':'205.0','transcript':'NM_000546.5','function':'missense','protein':'p.Cys277Phe','location':'exonic','origAlt':'A','exon':'8','oncomineGeneClass':'Loss-of-Function','oncomineVariantClass':'Hotspot'}]    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT    0/1:149:4420:1999:4043:1833:372,2:166,0:0.0830415,0.0:185,1:187,1:2118:1925:84,0:82,0:952:881:1
chr17    27400788    TIAF1    G    <CNV>    100.0    PASS    HS;FR=.;PRECISE=FALSE;SVTYPE=CNV;END=27495549;LEN=94761;NUMTILES=22;SD=0.33;CDF_MAPD=0.01:1.251248,0.025:1.295465,0.05:1.33475,0.1:1.381537,0.2:1.440411,0.25:1.46343,0.5:1.56,0.75:1.662943,0.8:1.689518,0.9:1.761516,0.95:1.823262,0.975:1.878554,0.99:1.944939;REF_CN=2;CI=0.05:1.33475,0.95:1.82326;RAW_CN=1.56;FUNC=[{'gene':'TIAF1'}]    GT:GQ:CN    ./.:0:1.56

awk
Code:
awk -F'\t' '{ # call awk and set FS as tab
        match($0,/oncomineGeneClass=[^:]*/ && /oncomineVariantClass=[^:]*/ && "PASS"); { # match lines on oncomineVariantClass and PASS
        print  # print line
 }
} 
' file   # define input

desired output
Code:
SVTYPE=Fusion;READ_COUNT=1868;GENE_NAME=ETV6;EXON_NUM=4;RPM=1.5825e-09;NORM_COUNT=0.001582480886121524;ANNOTATION=COSF823;FUNC=[{'gene':'ETV6','exon':'4','oncomineGeneClass':'Gain-of-Function','oncomineVariantClass':'Fusion'}]     GT:GQ    ./.:.
chr15    88483984    ETV6-NTRK3.E4N15.COSF823.1_2    T    ]chr12:12006495]T    .    PASS


Last edited by cmccabe; 05-09-2017 at 09:09 PM.. Reason: fixed format
# 2  
Old 05-10-2017
Hello,
The problem is the regular expression, and in the input file the keys are enclosed in single quotes (').
This is not a very smart code, but it works:
Code:
awk -F'\t' 
'
BEGIN{ a="\x27oncomineGeneClass\x27:";
       b="\x27oncomineVariantClass\x27:";
       c="PASS"; 
    }
    { if ( match($0, a) && match($0, b) && match($0, c) )
         print;
    }
' file   # define input

Greetings!
This User Gave Thanks to AbelLuis For This Post:
# 3  
Old 05-10-2017
So, in other words: match(s, r) takes a single regular expression, not multiple re's.


--
Another example:
Code:
awk '$7=="PASS" && /oncomineGeneClass/ && /oncomineVariantClass/' file

These 2 Users Gave Thanks to Scrutinizer For This Post:
# 4  
Old 05-10-2017
Quote:
Originally Posted by cmccabe
[...]
desired output
Code:
SVTYPE=Fusion;READ_COUNT=1868;GENE_NAME=ETV6;EXON_NUM=4;RPM=1.5825e-09;NORM_COUNT=0.001582480886121524;ANNOTATION=COSF823;FUNC=[{'gene':'ETV6','exon':'4','oncomineGeneClass':'Gain-of-Function','oncomineVariantClass':'Fusion'}]     GT:GQ    ./.:.
chr15    88483984    ETV6-NTRK3.E4N15.COSF823.1_2    T    ]chr12:12006495]T    .    PASS

Your desired output is comprised of two different records, therefore it cannot be handled by the logic you are using.

Unfortunately, your one sample introduces ambiguity which makes it hard to guess for a possible alternative.
This User Gave Thanks to Aia For This Post:
# 5  
Old 05-10-2017
Quote:
Originally Posted by Scrutinizer
So, in other words: match(s, r) takes a single regular expression, not multiple re's.


--
Another example:
Code:
awk '$7=="PASS" && /oncomineGeneClass/ && /oncomineVariantClass/' file

Thanks, @Scrutinizer.

If I wanto to match the re's with single quotes and colon,

Code:
awk '$7=="PASS" && /\'oncomineGeneClass\':/ && /\'oncomineVariantClass\':/' file

It doesn't work so. How should it be modified?

Regards.
This User Gave Thanks to AbelLuis For This Post:
# 6  
Old 05-10-2017
Quote:
Originally Posted by AbelLuis
Thanks, @Scrutinizer.

If I wanto to match the re's with single quotes and colon,

Code:
awk '$7=="PASS" && /\'oncomineGeneClass\':/ && /\'oncomineVariantClass\':/' file

It doesn't work so. How should it be modified?

Regards.
You could place the code in a file.awk to avoid the shell quoting.
You could also,
Code:
awk '$7=="PASS" && /\47oncomineGeneClass\47:/ && /\47oncomineVariantClass\47:/' file

These 2 Users Gave Thanks to Aia For This Post:
# 7  
Old 05-10-2017
You could also use:
Code:
awk -v p1="PASS" -v p2="'oncomineGeneClass'" -v p3="'oncomineVariantClass':" '$0 ~ p1 && $0 ~ p2 && $0 ~ p3' file

These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Reading a file line by line and print required lines based on pattern

Hi All, i want to write a shell script read below file line by line and want to exclude the lines which contains empty value for MOUNTPOINT field. i am using centos 7 Operating system. want to read below file. # cat /tmp/d5 NAME="/dev/sda" TYPE="disk" SIZE="60G" OWNER="root"... (4 Replies)
Discussion started by: balu1234
4 Replies

2. UNIX for Beginners Questions & Answers

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies

3. Shell Programming and Scripting

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns Continue here./mod] Please read forum... (1 Reply)
Discussion started by: Jairaj
1 Replies

4. Shell Programming and Scripting

awk to print specific line in file based on criteria

In the file below I am trying to extract a specific instance of path, if the adjacent plugin": "/rundb/api/v1/plugin/49/. Thank you :). file "path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/",... (8 Replies)
Discussion started by: cmccabe
8 Replies

5. Shell Programming and Scripting

HELP: Shell Script to read a Log file line by line and extract Info based on KEYWORDS matching

I have a LOG file which looks like this Import started at: Mon Jul 23 02:13:01 EDT 2012 Initialization completed in 2.146 seconds. -------------------------------------------------------------------------------- -- Import summary for Import item: PolicyInformation... (8 Replies)
Discussion started by: biztank
8 Replies

6. UNIX for Advanced & Expert Users

Forwarding based on keywords in sendmail

I have an application that runs on the server with root privileges and all emails it sends get sent to root (errors, logs, etc), when they should actually go to one of application admins. I would like to separate these emails from the OS related one sent to root and forward them to that... (2 Replies)
Discussion started by: vostrushka
2 Replies

7. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

8. Shell Programming and Scripting

Print selection of line based on line number

Hi Unix gurus Basically i am searching for the pattern and getting the line numbers of the grepped pattern. I am trying to print the series of lines from 7 lines before the grepped line number to the grepped line number. I am trying to use the following code. but it is not working. cat... (3 Replies)
Discussion started by: mohanm
3 Replies

9. Shell Programming and Scripting

Print entire line based on value in a column

Friends, File1.txt abc|0|xyz 123|129|opq def|0|678 890|pqw|sdf How do I print the entire line where second column has value is 0? Expected Result: abc|0|xyz def|0|678 Thanks, Prashant ---------- Post updated at 02:14 PM ---------- Previous update was at 02:06 PM ---------- ... (1 Reply)
Discussion started by: ppat7046
1 Replies

10. Shell Programming and Scripting

Capture lines based on keywords

Hello everyone, I am trying to write a script that will capture few lines from a text file based on 2 keywords in the first line and 1 keyword in the last one. It could also be based on the first line only + the folllowing 3 lines. Could some one help or give directions. Thanks. (4 Replies)
Discussion started by: nimo
4 Replies
Login or Register to Ask a Question