Remove lines from output in files using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove lines from output in files using awk
# 1  
Old 07-01-2016
Remove lines from output in files using awk

I have two large files (~250GB) that I am trying to remove the where GT: 0/0 or 1/1 or 2/2 for both files. I was going to use a bash with the below awk, which I think will find each line but how do I remove that line is that condition is found? Thank you Smilie.

Input
Code:
20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS

Code:
awk '$9~"^[012]"{$0=$0($9~"^(0/0|1/1|2/2)"?" hom
":" het")}1' input

Desired output
Code:
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS


Last edited by RudiC; 07-01-2016 at 12:38 PM.. Reason: corrected icode tags.
# 2  
Old 07-01-2016
Code:
awk '$NF ~ /0\/1/'

This User Gave Thanks to Aia For This Post:
# 3  
Old 07-01-2016
Your spec is (not for the first time) rather misleading. There's NO field that contains GT: 0/0 or 1/1 or 2/2. It is left to the reader's interpretation that field 9 is a sort of description for the next field, and field 10 seems to have the respective values. Your unfit code snippet doesn't help either. It doesn't remove any lines, nor will field 9 ever start with 0, 1, or 2.

And, no logic connection between the TWO files is perceivable. You seem to request a solution for ANY file applicable for your two generic files.

Please be aware that a correct, detailed, and carefully taylored specification will save everybody's time including your's!

For your problem, try
Code:
awk '$NF !~ /^(0\/0|1\/1|2\/2)/' file
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS

This User Gave Thanks to RudiC For This Post:
# 4  
Old 07-01-2016
To check for values in field 10 of any subfield identified in field 9, try
Code:
awk '
        {for (n=split ($9, TMP, ":"); n>0; n--) TYPE[TMP[n]] = n
         split ($10, VAL, ":")
         if (VAL[TYPE[SUB]] ~ PAT) next
        }
1
' SUB="GT" PAT="0/0|1/1|2/2" file
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS

or, for the last subfield "FL", it yields
Code:
 SUB="FL" PAT="PASS," file
20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS

This User Gave Thanks to RudiC For This Post:
# 5  
Old 07-01-2016
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: output lines with common field to separate files

Hi, A beginner one. my input.tab (tab-separated): h1 h2 h3 h4 h5 item1 grpA 2 3 customer1 item2 grpB 4 6 customer1 item3 grpA 5 9 customer1 item4 grpA 0 0 customer2 item5 grpA 9 1 customer2 objective: output a file for each customer ($5) with the item number ($1) only if $2 matches... (2 Replies)
Discussion started by: beca123456
2 Replies

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

awk Question: How to remove lines in which $3 == $1 +4

Hi all, I am trying to delete all lines from a file in which the value in 'column 3' is not the value of 'column 1' + 4. The code below that I tried doesn't work. awk '$3 == $1 + 4 {print}' input > output Example Input:- 1 xxx 2 3 xxx 26 4 xxx 8 2 xxx 9 7 xxx 11 (input file... (9 Replies)
Discussion started by: livbaddeley
9 Replies

4. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

5. Shell Programming and Scripting

[Solved] awk to remove lines

Hi, I have a file with contents. file1: <2013 tttaaa abc123 <2013 gggdddd <2013 sssssss <2013 eeeee I need to remove the lines which do not have the word "tttaaa" can some one help ? (7 Replies)
Discussion started by: giri_luck
7 Replies

6. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

7. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

8. Shell Programming and Scripting

How to remove lines before and after with awk / sed ?

Hi guys, I need to remove the pattern (ID=180), one line before and four lines after. Thanks. (5 Replies)
Discussion started by: ashimada
5 Replies

9. Shell Programming and Scripting

compare two files and to remove the matching lines on both the files

I have two files and need to compare the two files and to remove the matching lines from both the files (4 Replies)
Discussion started by: shellscripter
4 Replies

10. Shell Programming and Scripting

awk to compare lines of two files and print output on screen

hey guys, I have two files both with two columns, I have already created an awk code to ignore certain lines (e.g lines that start with 963) as they wou ld begin with a certain string, however, the rest I have added together and calculated the average. At the moment the code also displays... (3 Replies)
Discussion started by: chlfc
3 Replies
Login or Register to Ask a Question