How to remove the lines with this pattern?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove the lines with this pattern?
# 1  
Old 04-05-2017
How to remove the lines with this pattern?

Hello everyone,
I have a sample data like this:
Code:
Glyma.10G051100 Glyma.02G036000 89.91 228 23 0 1 228 1 228 1e-78 294
Glyma.10G051100 Glyma.09G023700 87.28 228 29 0 1 228 1 228 1e-68 261
Glyma.10G285200 Glyma.20G103800 96.33 1663 55 4 1 1657 1 1663 0.0 2728
Glyma.10G285200 Glyma.05G093700 95.02 321 16 0 406 726 1 321 8e-142 505
Glyma.10G212900 Glyma.17G186600 90.36 1338 129 0 1 1338 1 1338 0.0 1757
Glyma.10G212900 Glyma.05G089000 90.21 1338 131 0 1 1338 1 1338 0.0 1746
Glyma.10G212900 Glyma.16G068000 88.67 1341 146 5 1 1338 1 1338 0.0 1629
Glyma.10G212900 Glyma.19G052400 88.83 1325 148 0 1 1325 1 1325 0.0 1628
Glyma.10G212900 Glyma.05G114900 88.25 1328 156 0 1 1328 1 1328 0.0 1589
Glyma.10G212900 Glyma.19G078900 89.31 262 27 1 1074 1335 202 462 2e-88 327
Glyma.10G212900 Glyma.19G078900 89.71 204 21 0 790 993 1 204 2e-68 261
Glyma.10G296300 Glyma.20G246900 95.11 470 23 0 1 470 1 470 0.0 741
Glyma.10G296300 Glyma.20G246900 92.26 168 7 2 744 911 834 995 2e-60 233
Glyma.10G001700 Glyma.10G179600 83.45 701 113 1 44 741 50 750 0.0 649
Glyma.10G179600 Glyma.10G001700 83.45 701 113 1 50 750 44 741 0.0 649
Glyma.10G056500 Glyma.10G056300 89.27 261 24 2 41 300 61 318 4e-88 324
Glyma.10G056300 Glyma.10G056500 89.27 261 24 2 61 318 41 300 5e-88 324
Glyma.10G088600 Glyma.10G085100 97.13 522 15 0 1 522 1 522 0.0 881
Glyma.10G085100 Glyma.10G088600 97.13 522 15 0 1 522 1 522 0.0 881

This is only small part of my dataset. What I expected is that remove all the
A-B, B-A pattern comparisons based on first two column:

Code:
Glyma.10G051100 Glyma.02G036000 89.91 228 23 0 1 228 1 228 1e-78 294
Glyma.10G051100 Glyma.09G023700 87.28 228 29 0 1 228 1 228 1e-68 261
Glyma.10G285200 Glyma.20G103800 96.33 1663 55 4 1 1657 1 1663 0.0 2728
Glyma.10G285200 Glyma.05G093700 95.02 321 16 0 406 726 1 321 8e-142 505
Glyma.10G212900 Glyma.17G186600 90.36 1338 129 0 1 1338 1 1338 0.0 1757
Glyma.10G212900 Glyma.05G089000 90.21 1338 131 0 1 1338 1 1338 0.0 1746
Glyma.10G212900 Glyma.16G068000 88.67 1341 146 5 1 1338 1 1338 0.0 1629
Glyma.10G212900 Glyma.19G052400 88.83 1325 148 0 1 1325 1 1325 0.0 1628
Glyma.10G212900 Glyma.05G114900 88.25 1328 156 0 1 1328 1 1328 0.0 1589
Glyma.10G212900 Glyma.19G078900 89.31 262 27 1 1074 1335 202 462 2e-88 327
Glyma.10G212900 Glyma.19G078900 89.71 204 21 0 790 993 1 204 2e-68 261
Glyma.10G296300 Glyma.20G246900 95.11 470 23 0 1 470 1 470 0.0 741
Glyma.10G296300 Glyma.20G246900 92.26 168 7 2 744 911 834 995 2e-60 233

How can I achieve this ? Thank you in advance.
# 2  
Old 04-05-2017
Hi, try:
Code:
awk 'NR==FNR{A[$1,$2]; next} !(($2,$1) in A)' file file

--
Note: the file needs to be specified twice
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 04-05-2017
A different approach using Perl:

Code:
$
$ cat data.txt
Glyma.10G051100 Glyma.02G036000 89.91 228 23 0 1 228 1 228 1e-78 294
Glyma.10G051100 Glyma.09G023700 87.28 228 29 0 1 228 1 228 1e-68 261
Glyma.10G285200 Glyma.20G103800 96.33 1663 55 4 1 1657 1 1663 0.0 2728
Glyma.10G285200 Glyma.05G093700 95.02 321 16 0 406 726 1 321 8e-142 505
Glyma.10G212900 Glyma.17G186600 90.36 1338 129 0 1 1338 1 1338 0.0 1757
Glyma.10G212900 Glyma.05G089000 90.21 1338 131 0 1 1338 1 1338 0.0 1746
Glyma.10G212900 Glyma.16G068000 88.67 1341 146 5 1 1338 1 1338 0.0 1629
Glyma.10G212900 Glyma.19G052400 88.83 1325 148 0 1 1325 1 1325 0.0 1628
Glyma.10G212900 Glyma.05G114900 88.25 1328 156 0 1 1328 1 1328 0.0 1589
Glyma.10G212900 Glyma.19G078900 89.31 262 27 1 1074 1335 202 462 2e-88 327
Glyma.10G212900 Glyma.19G078900 89.71 204 21 0 790 993 1 204 2e-68 261
Glyma.10G296300 Glyma.20G246900 95.11 470 23 0 1 470 1 470 0.0 741
Glyma.10G296300 Glyma.20G246900 92.26 168 7 2 744 911 834 995 2e-60 233
Glyma.10G001700 Glyma.10G179600 83.45 701 113 1 44 741 50 750 0.0 649
Glyma.10G179600 Glyma.10G001700 83.45 701 113 1 50 750 44 741 0.0 649
Glyma.10G056500 Glyma.10G056300 89.27 261 24 2 41 300 61 318 4e-88 324
Glyma.10G056300 Glyma.10G056500 89.27 261 24 2 61 318 41 300 5e-88 324
Glyma.10G088600 Glyma.10G085100 97.13 522 15 0 1 522 1 522 0.0 881
Glyma.10G085100 Glyma.10G088600 97.13 522 15 0 1 522 1 522 0.0 881
$
$ perl -lane '$key = ($F[0] le $F[1]) ? $F[0]." ".$F[1] : $F[1]." ".$F[0];
              if (defined $x{$key}) {
                  ($old_key = $x{$key}->[0]) =~ s/^\d+:(\S+ \S+).*/$1/;
                  if ($F[0]." ".$F[1] ne $old_key){ delete $x{$key} }
                  else { push @{$x{$key}}, $..":".$_ }
              } else {
                  $x{$key} = [ $..":".$_ ];
              } END {
                  foreach (sort {(split ":",$a)[0] <=> (split ":",$b)[0]} map{@$_} (values %x)){
                      print ((split ":")[1]);
                  }
              }
             ' data.txt
Glyma.10G051100 Glyma.02G036000 89.91 228 23 0 1 228 1 228 1e-78 294
Glyma.10G051100 Glyma.09G023700 87.28 228 29 0 1 228 1 228 1e-68 261
Glyma.10G285200 Glyma.20G103800 96.33 1663 55 4 1 1657 1 1663 0.0 2728
Glyma.10G285200 Glyma.05G093700 95.02 321 16 0 406 726 1 321 8e-142 505
Glyma.10G212900 Glyma.17G186600 90.36 1338 129 0 1 1338 1 1338 0.0 1757
Glyma.10G212900 Glyma.05G089000 90.21 1338 131 0 1 1338 1 1338 0.0 1746
Glyma.10G212900 Glyma.16G068000 88.67 1341 146 5 1 1338 1 1338 0.0 1629
Glyma.10G212900 Glyma.19G052400 88.83 1325 148 0 1 1325 1 1325 0.0 1628
Glyma.10G212900 Glyma.05G114900 88.25 1328 156 0 1 1328 1 1328 0.0 1589
Glyma.10G212900 Glyma.19G078900 89.31 262 27 1 1074 1335 202 462 2e-88 327
Glyma.10G212900 Glyma.19G078900 89.71 204 21 0 790 993 1 204 2e-68 261
Glyma.10G296300 Glyma.20G246900 95.11 470 23 0 1 470 1 470 0.0 741
Glyma.10G296300 Glyma.20G246900 92.26 168 7 2 744 911 834 995 2e-60 233
$
$

If preserving the original line numbers is not important, then this script could be simplified.
However, it will still store the hash in memory until the entire file is read, so it probably is not as efficient as Scrutinizer's approach.
This User Gave Thanks to durden_tyler For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

In the awk piped to sed below I am trying to format file by removing the odd xxxx_digits and whitespace after, then move the even xxxx_digit to the line above it and add a space between them. There may be multiple lines in file but they are in the same format. The Filename_ID line is the last line... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

In the awk below I am trying to remove all lines above and including the pattern Test or Test2. Each block is seperated by a newline and Test2 also appears in the lines to keep but it will always have additional text after it. The Test to remove will not. The awk executed until the || was added... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

sed to remove next 15 lines after pattern in Solaris

Team, I am trying to use sed to delete 15 lines, after pattern patch, which includes the pattern as well in Solaris. I used the below command, as we do it Linux, but it's not working as expected in Solaris. I am getting the error as "garbled".sed '/\/table/,+15d' status.html sed: command... (8 Replies)
Discussion started by: Nagaraj R
8 Replies

4. Shell Programming and Scripting

awk remove/grab lines from file with pattern from other file

Sorry for the weird title but i have the following problem. We have several files which have between 10000 and about 500000 lines in them. From these files we want to remove lines which contain a pattern which is located in another file (around 20000 lines, all EAN codes). We also want to get... (28 Replies)
Discussion started by: SDohmen
28 Replies

5. Shell Programming and Scripting

Remove multiple lines that match pattern

Not sure how I can accomplish this. I would like to remove all interfaces that have the commands I would like to see: switchport port-security, spanning-tree portfast. One line is no problem. interface FastEthernet0/8 spanning-tree portfast interface FastEthernet0/9 spanning-tree... (4 Replies)
Discussion started by: mrlayance
4 Replies

6. Shell Programming and Scripting

Searching for pattern and remove the lines

Hi , I want to remove the specific pattern and remove those lines from file using shell script. i want to remove these lines <?xml version='1.0' encoding='UTF-8'?> <row_set> </row_set> my input file has content like this. file name: sample.xml <?xml version='1.0'... (4 Replies)
Discussion started by: nukala_2
4 Replies

7. Shell Programming and Scripting

NAWK to remove lines that matches a specific pattern

Hi, I have requirement that I need to split my input file into two files based on a search pattern "abc" For eg. my input file has below content abc defgh zyx I need file 1 with abc and file2 with defgh zyx I can use grep command to acheive this. But with grep I need... (8 Replies)
Discussion started by: sbhuvana20
8 Replies

8. Shell Programming and Scripting

Remove lines from batch of files which match pattern

I need to remove all lines which does not match the pattern from a text file (batch of text files). I also need to keep the header line which is the first line of the file. Please can you provide an example for that. I used this to solve half of my work. I was unable to keep the first line of... (3 Replies)
Discussion started by: talktobog
3 Replies

9. Shell Programming and Scripting

shell script to remove all lines from a file before a line starting with pattern

hi,, i hav a file with many lines.i need to remove all lines before a line begginning with a specific pattern from the file because these lines are not required. Can u help me out with either a perl script or shell script example:- if file initially contains lines: a b c d .1.2 d e f... (2 Replies)
Discussion started by: raksha.s
2 Replies

10. Shell Programming and Scripting

grep for a particular pattern and remove few lines above top and bottom of the patter

grep for a particular pattern and remove 5 lines above the pattern and 6 lines below the pattern root@server1 # cat filename Shell Programming and Scripting test1 Shell Programminsada asda dasd asd Shell Programming and Scripting Post New Thread Shell Programming and S sadsa ... (17 Replies)
Discussion started by: fed.linuxgossip
17 Replies
Login or Register to Ask a Question