Remove line based on condition in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove line based on condition in awk
# 1  
Old 08-25-2016
Remove line based on condition in awk

In the following tab-delimited input, I am checking $7 for the keyword intronic. If that keyword is found then $2 is split by the .[ in each line and if the string after the digits or the +/- is >10, then that line is deleted. This will always be the case for intronic. If $7 is exonic then nothing is done and the next line is processed.

For example, using the first line in input:
$7=intronic, so $2 or c.[433+79A>G]+[433+79A>G] is split using the .[ in bold, and the string after the digits after the + is >10, so that line is removed.

awk
Code:
awk -F'\t' -v OFS='\t' FNR==NR 'if ($7 ==/intronic/) ; {split($2,f2,"[[digits]");a[f2[1]];next} $2 in a' or ; {split($2,f3,"[[digits]");a[f3[1]];next} $2 in a' input

input
Code:
Index    Mutation Call    Start    End    Ref    Alt    Func.refGene    Gene.refGene    ExonicFunc.refGene    Sanger
1    c.[433+79A>G]+[433+79A>G]    40556922    40556922    T    C    intronic    PPT1        
2    c.[362+8C>T]+[=]    40557656    40557656    G    A    intronic    PPT1        
3    c.276-31delG    43396570    43396570    C    -    intronic    SLC2A1    
20    c.[5109C>T]+[=]    166245425    166245425    C    T    exonic    SCN2A    synonymous SNV    
21    c.[5139C>T]+[=]    166848646    166848646    G    A    exonic    SCN1A    synonymous SNV

desired output
Code:
Index    Mutation Call    Start    End    Ref    Alt    Func.refGene    Gene.refGene    ExonicFunc.refGene    Sanger
2    c.[362+8C>T]+[=]    40557656    40557656    G    A    intronic    PPT1         
20    c.[5109C>T]+[=]    166245425    166245425    C    T    exonic    SCN2A    synonymous SNV    
21    c.[5139C>T]+[=]    166848646    166848646    G    A    exonic    SCN1A    synonymous SNV


Last edited by cmccabe; 08-25-2016 at 04:39 PM.. Reason: fixed format
# 2  
Old 08-25-2016
Try this:

Code:
awk -F'\t' '
$7=="intronic" {
   v=$2
   sub(/.*\.\[[^+-]+[+-]/,"",v)
   if(v + 0 > 10) next
}
1' input

output:
Code:
Index   Mutation        Call    Start   End     Ref     Alt     Func.refGene    Gene.refGene    ExonicFunc.refGene      Sanger
2       c.[362+8C>T]+[=]        40557656        40557656        G       A       intronic        PPT1
3       c.276-31delG    43396570        43396570        C       -       intronic        SLC2A1
20      c.[5109C>T]+[=] 166245425       166245425       C       T       exonic  SCN2A   synonymous      SNV
21      c.[5139C>T]+[=] 166848646       166848646       G       A       exonic  SCN1A   synonymous      SNV

Edit: Line 3 is not deleted as it didn't contain .[ - your sampe output still has it deleted. is the .[ not important?

Last edited by Chubler_XL; 08-25-2016 at 05:05 PM..
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 08-25-2016
So what is to be done with those lines that have neither intronic or exonic in them?
This User Gave Thanks to shamrock For This Post:
# 4  
Old 08-25-2016
It doesn't look like the .[ is important as long as all the +/- get removed if they meet the condition, but $7 could be intronic or UTR5 or UTR3, is this possible to include in one awk?
@Shamrock the exonic will be used in another awk, but I'm not quite sure of the details yet. Thank you Smilie.

Maybe
Code:
awk -F'\t' '
$7=="intronic || UTR3 || UTR5" {
   v=$2
   sub(/.*[^+-]+[+-]/,"",v)
   if(v + 0 > 10) next
}
1' input


Last edited by cmccabe; 08-25-2016 at 06:03 PM.. Reason: added details and awk
# 5  
Old 08-25-2016
Try:

Code:
awk -F'\t' '
$7 ~ "^(intronic|UTR3|UTR5)$" {
   v=$2
   sub(/^[^+-]+[+-]/,"",v)
   if(v + 0 > 10) next
}
1' input

This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 08-26-2016
Quote:
Originally Posted by Chubler_XL
Try:

Code:
awk -F'\t' '
$7 ~ "^(intronic|UTR3|UTR5)$" {
   v=$2
   sub(/^[^+-]+[+-]/,"",v)
   if(v + 0 > 10) next
}
1' input

Thank you Chubler_XL for nice code. Could you please help me here in one of my confusion. So when I print the value of variable na,ed v(which has 2nd field's value in it) as follows.
Code:
awk '                                                                                                                  
$7 ~ "^(intronic|UTR3|UTR5)$" {
   v=$2
   sub(/^[^+-]+[+-]/,"",v)
   print v;if(v + 0 > 10) next
}
1' Input_file

Output will be as follows then.
Code:
Index    Mutation Call    Start    End    Ref    Alt    Func.refGene    Gene.refGene    ExonicFunc.refGene    Sanger
79A>G]+[433+79A>G]   #### Value of variable v
8C>T]+[=]                     #### Value of variable v
2    c.[362+8C>T]+[=]    40557656    40557656    G    A    intronic    PPT1        
8C>T]+[=]
2    c.[362+8C>T]+[=]    40557656    40557656    G    A    UTR3    PPT1
31delG
20    c.[5109C>T]+[=]    166245425    166245425    C    T    exonic    SCN2A    synonymous SNV    
21    c.[5139C>T]+[=]    166848646    166848646    G    A    exonic    SCN1A    synonymous SNV

So in above as we could see value of variable v is 79A>G]+[433+79A>G], so I understood like by doing v+0 e are telling awk here to consider it as digit and then comparing it with 10 but doubt here is, it has alphabets as well as digits into it after 79A, so awk will still consider 79 only for comparison here? Could you please guide me here, will be grateful to you sir.


Hello cmccabe,

Could you please try following and let me know if this helps you too. Though my solution is based on I am trying to take exact digits between -/+ to </>here then later by adding 0 to it comparing it.
Code:
awk -F'\t' '($7 ~ /intronic||UTR3||UTR5/){v=$2;a=sub(/^[^+-]+\+/,X,v);if(a){sub(/([><]).*/,X,v)};if((v+0)>10){next}} 1'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 7  
Old 08-26-2016
Quote:
Originally Posted by RavinderSingh13
So in above as we could see value of variable v is 79A>G]+[433+79A>G], so I understood like by doing v+0 e are telling awk here to consider it as digit and then comparing it with 10 but doubt here is, it has alphabets as well as digits into it after 79A, so awk will still consider 79 only for comparison here? Could you please guide me here, will be grateful to you sir.
Yes, that is how awk works when converting strings value to numeric. It will continue until it comes to a character that makes the string non-numeric and just ignores the rest of the string. Try this code for some examples
Code:
awk '
function try(A) {
  print A "\t" A + 0
}
BEGIN {
  try("27.2A")
  try("3..1415")
  try(".23A27")
  try("009001A")
} '

These 2 Users Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to add +1 to value based on condition in input

In the awk below I am trying to add a | that will adjust $2 in the ouput by adding +1 if the original value from file that was used in $3 had a - in it. Line 3 of file is an example of this. In my current awk I just subtract one but I am not sure how to only apply this to those values without a -.... (5 Replies)
Discussion started by: cmccabe
5 Replies

2. Shell Programming and Scripting

awk to reformat lines based on condition

The awk below uses the tab-delimeted fileand reformats each line based on one of three conditions (rules). The 3 rules are for deletion (lines in blue), snv (line in red), and insertion (lines in green). I have included all possible combinations of lines from my actual data, which is very large.... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

Help with awk color codes based on condition

HI i have two files say test and test1 Test.txt Code: Lun01 2TB 1.99TB 99.6% Lun02 2TB 1.99TB 99.5% Lun03 2TB 1.99TB 99.5% Lun04 2TB 1.55TB 89.6% Code: Test1.txt Lun01 2TB 1.99TB 89.5% Lun02 2TB 1.99TB 99.5% Lun03 2TB 1.99TB 99.5% Requirement is to compare... (6 Replies)
Discussion started by: venkitesh
6 Replies

4. Shell Programming and Scripting

Print lines based on line number and specified condition

Hi, I have a file like below. 1,2,3,4,5,6,7,8,9I would like to print or copied to a file based of line count in perl If I gave a condition 1 to 3 then it should iterate over above file and print 1 to 3 and then again 1 to 3 etc. output should be 1,2,3 4,5,6 7,8,9 (10 Replies)
Discussion started by: Anjan1
10 Replies

5. Shell Programming and Scripting

Multi line extraction based on condition

Hi I have some data in a file as below ****************************** Class 1A Students absent are : 1. ABC 2. CDE 3. CPE ****************************** Class 2A Students absent are : ****************************** Class 3A Students absent are : (6 Replies)
Discussion started by: reldb
6 Replies

6. Shell Programming and Scripting

Remove duplicate line on condition

Hi Ive been scratching over this for some time with no solution. I have a file like this 1 bla bla 1 2 bla bla 2 4 bla bla 3 5 bla bla 1 6 bla bla 1 I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different. Any ideasss?? (23 Replies)
Discussion started by: jamie_123
23 Replies

7. Shell Programming and Scripting

ksh: how to extract strings from each line based on a condition

Hi , I'm a newbie.Never worked on Unix before. I want a shell script to perform the following: I want to extract strings from each line ,based on the type of line(Nameline,Subline) and output it to another file.Below is a sample format. 2010-12-21 14:00"1"Nameline"Midterm"First Name:Jane ... (4 Replies)
Discussion started by: angie1234
4 Replies

8. Shell Programming and Scripting

Remove lines from XML based on condition

Hi, I need to remove some lines from an XML file is the value within a tag is empty. Imagine this scenario, <acd><acdID>2</acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> I... (3 Replies)
Discussion started by: giles.cardew
3 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. Shell Programming and Scripting

How to search then remove based on condition

Folks; I'm trying to write a script to scan through a directory tree then for each file it finds, it run a command line tool, then if the results include the word "DONE", it removes the file. In more details; i have a Linux directory tree such as "/opt/grid/1022/store" I'm trying to write a... (6 Replies)
Discussion started by: Katkota
6 Replies
Login or Register to Ask a Question