awk to print lines based on text in field and value in two additional fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to print lines based on text in field and value in two additional fields
# 1  
Old 07-10-2017
awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than or equal to 0.01 then the entire line, along with header row, is printed.

Since it is possible for $2 to be SNV or MNV or INDEL and $7 to be blank or null, then I am not sure how to capture this as well. Line 1 is an example of this. The assumption is that if there is no value in $4 then this is the same as zero so may be significant and is extracted. I am also not sure how to include the header row minus the # in the print. The --- are not part of the file, they are just there to indicate the header. I added comments to each line as well. Thank you Smilie.



file.tsv tab-delimited

Code:
##reference=hg19
##referenceURI=hg19
# locus    type    pvalue    coverage    gene    transcript    5000Exomes    function    ----- header row
chr4:153271308    SNV    1.30E-20    2000    FBXW7    NM_033632.3        intronic
chr1:123456    SNV    0    1800    APC    NM_0000    AMAF=0.0041:EMAF=0.0:GMAF=0.0014    exonic
chr2:78555    REF    0    1900    APC    NM_0000    
chr1:123456    MNV    0    2000    APC    NM_0000    AMAF=0.2195:EMAF=0.1378:GMAF=0.1655    exonic

current output

Code:
locus    type    pvalue    coverage    gene    transcript    5000Exomes    function    ----- header row
chr4:153271308    SNV    1.30E-20    2000    FBXW7    NM_033632.3        intronic
chr1:123456    MNV    0    2000    APC    NM_0000    AMAF=0.2195:EMAF=0.1378:GMAF=0.1655    exonic

[/CODE]

desired output tab-delimited

Code:
locus    type    pvalue    coverage    gene    transcript    5000Exomes    function
chr4:153271308    SNV    1.30E-20    2000    FBXW7    NM_033632.3        intronic
chr1:123456    SNV    0    1800    APC    NM_0000    AMAF=0.0041:EMAF=0.0:GMAF=0.0014    exonic

awk

Code:
awk 'NR<3{next}                                                          # start processing in row 3
     NR==3{print gensub(/^# /,"","1");next}                              # print the third line (header) by removing the leading # and whitespace
     $2 == "SNV" || $2 == "MNV" || $2 == "INDEL" && $3 <=0.05 {           # if $2 and $6 meet the criteria
            if (NF!=7) {val=gensub(/.*GMAF=(.[^:]*).*/,"\\1","g",$7);   # isolate the value of GMAF with regex and missing lines
               if (val<=0.01) next} print }' file.tsv > out.txt # compare and print


Last edited by cmccabe; 07-11-2017 at 07:12 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to print output based on first field.

Hi Folks, I have one requirement, There is one file, which contains two fields. Based on first field, I need to print an output. Example will be more suitable. Input file like this. abc 5 abc 10 xyz 6 xyz 9 xyz 10 mnp 10 mnp 12 mnp 6 (2 Replies)
Discussion started by: Raza Ali
2 Replies

2. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

3. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ... (1 Reply)
Discussion started by: cmccabe
1 Replies

4. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

6. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

7. Shell Programming and Scripting

How to Print from nth field to mth fields using awk

Hi, Is there any short method to print from a particular field till another filed using awk? Example File: File1 ==== 1|2|acv|vbc|......|100|342 2|3|afg|nhj|.......|100|346 Expected output: File2 ==== acv|vbc|.....|100 afg|nhj|.....|100 (8 Replies)
Discussion started by: machomaddy
8 Replies

8. Shell Programming and Scripting

awk - print all fields except for last field

How do I print all the fields of a record except for the $(NF) field? (4 Replies)
Discussion started by: locoroco
4 Replies

9. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

10. Shell Programming and Scripting

AWK : Add Fields of lines with matching field

Dear All, I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input: Input: Test1 5 Test1 10 Test2 2 Test2 5 Test2 13 Test3 4 Output: Test1 7.5 Test1 7.5... (6 Replies)
Discussion started by: DerSeb
6 Replies
Login or Register to Ask a Question