awk to filter file based on seperate conditions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to filter file based on seperate conditions
# 1  
Old 01-13-2017
awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV
that will only print that line if CI= must be >.05 .

The other condition to add is if SVTYPE=Fusion, then in order to print that line
READ_COUNT must be > 10. Thank you Smilie.

file
Code:
chr1    11184539    MTOR    A    <CNV>    100.0    PASS    FR=.;PRECISE=FALSE;SVTYPE=CNV;END=11217311;LEN=32772;NUMTILES=4;SD=0.18;CDF_MAPD=0.01:1.373797,0.025:1.472018,0.05:1.562112,0.1:1.67288,0.2:1.817619,0.25:1.875834,0.5:2.13,0.75:2.418604,0.8:2.496068,0.9:2.71203,0.95:2.904337,0.975:3.082096,0.99:3.302454;REF_CN=2;CI=0.05:1.56211,0.95:2.90434;RAW_CN=2.13;FUNC=[{'gene':'MTOR'}]    GT:GQ:CN    ./.:0:2.13
chr1    11810242    AGTRAP-BRAF.A5B8.COSF828.1_1    G    G]chr7:140494267]    .    FAIL    SVTYPE=Fusion;READ_COUNT=0;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=READ_COUNT<=40|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}]    GT:GQ    ./.:.
chr7:140494267]    .    PASS     SVTYPE=Fusion;READ_COUNT=16;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}]     GT:GQ    ./.:.

desired output
Code:
chr7:140494267]    .    PASS     SVTYPE=Fusion;READ_COUNT=16;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}]     GT:GQ    ./.:.

awk
Code:
awk -F'\t' -v OFS='\t\ '/SVTYPE=/{print}' file


Last edited by cmccabe; 01-13-2017 at 07:35 PM.. Reason: fixed format
# 2  
Old 01-14-2017
Hello cmccabe,

Could you please try following. You could add tab delimiters by using -F"\t" and OFS="\t" if needed.
Code:
awk '{match($0,/SVTYPE=[^;]*/);SVTYPE_VALUE=substr($0,RSTART+7,RLENGTH-7);match($0,/READ_COUNT[^;]*/);READ_COUNT_VALUE=substr($0,RSTART+11,RLENGTH-11);match($0,/CI=[^:]*/);CI_VALUE=substr($0,RSTART+3,RLENGTH-3);if(SVTYPE_VALUE == "CNV" && CI +0> 0.5){print};if(SVTYPE_VALUE == "Fusion" && READ_COUNT_VALUE+0 > 10){print}}'  Input_file

EDIT: Adding a non-one liner form of solution too now.
Code:
awk '{
        match($0,/SVTYPE=[^;]*/);
        SVTYPE_VALUE=substr($0,RSTART+7,RLENGTH-7);
        match($0,/READ_COUNT[^;]*/);
        READ_COUNT_VALUE=substr($0,RSTART+11,RLENGTH-11);
        match($0,/CI=[^:]*/);
        CI_VALUE=substr($0,RSTART+3,RLENGTH-3);
        if(SVTYPE_VALUE == "CNV" && CI+0 > 0.5)                {
                                                                print
                                                             };
        if(SVTYPE_VALUE == "Fusion" && READ_COUNT_VALUE+0 > 10){
                                                                print
                                                             }
     }
    '   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 01-14-2017 at 12:07 AM.. Reason: Adding a non-one liner form of solution too now.
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 01-14-2017
Another option to try:
Code:
awk -F'[\t;]' '
  {
    split(x,V)
    for(i=1; i<=NF; i++) {
      split($i,F,/=/)
      V[F[1]]=F[2]
    }
  }
  (V["SVTYPE"]=="CNV"    && V["CI"]+0 > .05) || 
  (V["SVTYPE"]=="Fusion" && V["READ_COUNT"]+0 > 10)
' file


Last edited by Scrutinizer; 01-14-2017 at 05:33 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 01-14-2017
Thank you both very muchSmilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)
Discussion started by: MIA651
15 Replies

3. Shell Programming and Scripting

awk to update file based on 5 conditions

I am trying to use awk to update the below tab-delimited file based on 5 different rules/conditions. The final output is also tab-delimited and each line in the file will meet one of the conditions. My attemp is below as well though I am not very confident in it. Thank you :). Condition 1: The... (10 Replies)
Discussion started by: cmccabe
10 Replies

4. Shell Programming and Scripting

Help with Creating file based on conditions

Can anyone please assist? I have a .txt file(File1.txt) and a property file(propertyfile.txt) . I have to read the vales from the property file and .txt file and create the output file(outputfile.txt) mentioned in the attachment. For each record in .txt file,the below mentioned values shall be... (20 Replies)
Discussion started by: vinus
20 Replies

5. Shell Programming and Scripting

Split File based on different conditions

I need to split the file Conditions: Ignore any record that either starts with 1 or 9 Split the file at position 404 , if position 404 is abc or def then write all the records in a file > File 1 , the remaining records should go in to a file > File 2 Further I want to split the... (7 Replies)
Discussion started by: protech
7 Replies

6. Shell Programming and Scripting

awk filter based on column value (variable value)

Hi, I have a requirement to display/write the 3rd column from a file based on the value in the column 3. Ex: Data in the File (comma delimited) ID,Value,Description 1,A,Active 1,I,Inactive 2,S,Started 1,N,None 2,C,Completed 2,F,Failed I need to first get a list of all Unique IDs in... (7 Replies)
Discussion started by: kiranredz
7 Replies

7. Shell Programming and Scripting

awk merging files based on 2 complex conditions

1. if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2 print all relevant values together as defined in the output. 2. A bit tricky part is IDno in the output. All we need to do is numbering same kind of letters as 1 (aa of ID1) and different letters as 2 (ab... (4 Replies)
Discussion started by: ruby_sgp
4 Replies

8. UNIX for Dummies Questions & Answers

How to get remove duplicate of a file based on many conditions

Hii Friends.. I have a huge set of data stored in a file.Which is as shown below a.dat: RAO 1869 12 19 0 0 0.00 17.9000 82.3000 10.0 0 0.00 0 3.70 0.00 0.00 0 0.00 3.70 4 NULL LEE 1870 4 11 1 0 0.00 30.0000 99.0000 0.0 0 0.00 0 0.00 0.00 0.00 0 ... (3 Replies)
Discussion started by: reva
3 Replies

9. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

10. Shell Programming and Scripting

validating a file based on conditions

i have a file in unix in which the records are like this aaa 123 233 aaa 234 222 aaa 242 222 bbb 122 111 bbb 122 123 ccc 124 222 In the output i want only the below records aaa ccc The validation logic is 1st column and 2nd column need to be considered if both columns values are... (8 Replies)
Discussion started by: trichyselva
8 Replies
Login or Register to Ask a Question