awk to capture lines that meet either condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to capture lines that meet either condition
# 1  
Old 08-02-2017
awk to capture lines that meet either condition

I am trying to modify and understand an awk written by @Scrutinizer

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV
that will only print that line if CI=,0.95: portion in blue in file is <1.9.

The other condition works perfectly and I added comments as to what I think is happening in each step. Thank you Smilie.


file
Code:
chr16	68771250	CDH1	G	<CNV>	100.0	PASS	HS;FR=.;PRECISE=FALSE;SVTYPE=CNV;END=68867430;LEN=96180;NUMTILES=39;SD=0.47;CDF_MAPD=0.01:0.962265,0.025:0.985543,0.05:1.006017,0.1:1.030158,0.2:1.060175,0.25:1.071807,0.5:1.12,0.75:1.17036,0.8:1.183201,0.9:1.217678,0.95:1.246897,0.975:1.272801,0.99:1.303591;REF_CN=2;CI=0.05:0.895574,0.95:1.16322;RAW_CN=1.12;FUNC=[{'gene':'CDH1'}]	GT:GQ:CN	./.:0:1.02
chr15	90631824	IDH2	G	<CNV>	100.0	PASS	FR=.;PRECISE=FALSE;SVTYPE=CNV;END=90631954;LEN=130;NUMTILES=1;SD;CDF_MAPD=0.01:0.647181,0.025:0.751369,0.05:0.85432,0.1:0.99068,0.2:1.185313,0.25:1.268903,0.5:1.67,0.75:2.197882,0.8:2.352881,0.9:2.815138,0.95:3.264469,0.975:3.711758,0.99:4.309304;REF_CN=2;CI=0.05:0.727022,0.95:3.40497;RAW_CN=1.67;FUNC=[{'gene':'IDH2'}]	GT:GQ:CN	./.:0:1.63

awk
Code:
awk -F'[\t;]' '              # define FS as tab and ;
   {
     split(x,V)
     for(i=1; i<=NF; i++) { # create loop i (which is each line) and iterate though    
       split($i,F,/=/)      # each portion (in green) of the line with the pattern = read into array F splitting using FS
       V[F[1]]=F[2]         # set each split in array F equal to array V (defined below)
     }
   }
   (V["SVTYPE"]=="CNV"    && V["CI"]+0 < 1.9) ||     # define V for CNV   - not sure if the entire CI is being used or maybe splitting on the , would work better
   (V["SVTYPE"]=="Fusion" && V["READ_COUNT"]+0 > 10) # define V for Fusion
' file > out

desired output - only this line has a CI=0.95 value < 1.9
Code:
chr16	68771250	CDH1	G	<CNV>	100.0	PASS	HS;FR=.;PRECISE=FALSE;SVTYPE=CNV;END=68867430;LEN=96180;NUMTILES=39;SD=0.47;CDF_MAPD=0.01:0.962265,0.025:0.985543,0.05:1.006017,0.1:1.030158,0.2:1.060175,0.25:1.071807,0.5:1.12,0.75:1.17036,0.8:1.183201,0.9:1.217678,0.95:1.246897,0.975:1.272801,0.99:1.303591;REF_CN=2;CI=0.05:0.895574,0.95:1.16322;RAW_CN=1.12;FUNC=[{'gene':'CDH1'}]	GT:GQ:CN	./.:0:1.02


Last edited by cmccabe; 08-02-2017 at 09:20 AM.. Reason: added details
# 2  
Old 08-02-2017
By adding print V["CI"] to the code I discovered that it thinks CI is 0.05:0.895574,0.95:1.16322

Which makes sense, as the input is only split on equals.

So I've added code to split the CI value specially:

Code:
awk -F'[\t;]' '              # define FS as tab and ;
   {
     split(x,V)
     for(i=1; i<=NF; i++) { # create loop i (which is each line) and iterate th$
       split($i,F,/=/)      # each portion (in green) of the line with the patt$
       V[F[1]]=F[2]         # set each split in array F equal to array V (defin$
     }

     split(V["CI"], A, ":"); # A[1]=0.05, A[2]=0.895574,0.95, etc
     V["CI"]=A[1]; # V["CI"]=0.95
   }
   (V["SVTYPE"]=="CNV"    && V["CI"]+0 < 1.9) ||     # define V for CNV   - not$
   (V["SVTYPE"]=="Fusion" && V["READ_COUNT"]+0 > 10) # define V for Fusion
' inputfile

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 08-03-2017
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk do not split if condition is meet

Trying to use awk to format the input based on the filed count being 5. Most lines are fine using the awk below, except the first two lines. I know the reason is the -1 in green and -2 in blue. But can not figure out how to not split on the - if it is followed by a digit then letter. Thank you :).... (1 Reply)
Discussion started by: cmccabe
1 Replies

2. Shell Programming and Scripting

awk to reformat lines based on condition

The awk below uses the tab-delimeted fileand reformats each line based on one of three conditions (rules). The 3 rules are for deletion (lines in blue), snv (line in red), and insertion (lines in green). I have included all possible combinations of lines from my actual data, which is very large.... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

awk to print lines that meet conditions and have value in another file

I am trying to use awk to print lines that satisfy either of the two conditions below: condition 1: $2 equals CNV and the split of $3, the value in red, is greater than or equal to 4. ---- this is a or so I think condition 2: $2 equals CNV and the split of $3, the value in red --- this is a... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

Print header and lines that meet both conditions in awk

In the awk below I am trying to print only the header lines starting with # or ## and the lines that $7 is PASS and AF= is less than 5%. The awk does execute but returns an empty file and I am not sure what I am doing wrong. Thank you. file ... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

awk to print matching lines in files that meet critera

In the tab delimited files below I am trying to match $2 in file1 to $2 of file2. If a match is found the awk checks $3 of file2 and if it is greater than 40% and $4 of file2 is greater than 49, the line in file1 is printed. In the desired output line3 of file1 is not printed because $3 off file2... (9 Replies)
Discussion started by: cmccabe
9 Replies

6. UNIX for Dummies Questions & Answers

Print lines meet requirement

Dear Masters, I have 2 files input below file1 8269229289|CROATIA|LUX 8269229412|ASIA|LUX 8269229371|EUROPE|LUX 8269229355|LANE|LUX 8269229469|SWISS|LUX 8269229477|HAMBURG|LUX 8269229484|EGYPT|LUX 8269229485|GERMANY|LUX 8269229498|CROATIA|LUX File2 8269229289|1100100020... (6 Replies)
Discussion started by: radius
6 Replies

7. Shell Programming and Scripting

perl or awk remove empty lines when condition

Hi Everyone, # cat 1 a b b cc 1 2 3 3 3 4 55 5 a b (2 Replies)
Discussion started by: jimmy_y
2 Replies

8. Shell Programming and Scripting

use awk pick value from lines as condition for grep

Hi Folks! I have a file like this 000000006 dist:0.0 FILE ./MintRoute/MultiHopWMEWMA.nc LINE:305:1 NODE_KIND:131 nVARs:4 NUM_NODE:66 TBID:733 TEID:758 000000000 dist:0.0 FILE ./Route/MultiHopLEPSM.nc LINE:266:1 NODE_KIND:131 nVARs:4 NUM_NODE:66 TBID:601 TEID:626 000000001 ... (2 Replies)
Discussion started by: jackoverflow
2 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. Shell Programming and Scripting

Printing records which meet condition using awk

Below is the code nawk -F"|" 'tolower($1) ~ "abc" |"" {if (tolower($2) ~"abc"|"") print$0}' file I want the records from file whose 1st and 2nd field should be either "abc" or "null" I tried but its giving error. Appreciate help (2 Replies)
Discussion started by: pinnacle
2 Replies
Login or Register to Ask a Question