Remove lines with duplicate pairs where AB is equal to BA


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove lines with duplicate pairs where AB is equal to BA
# 1  
Old 10-27-2014
Remove lines with duplicate pairs where AB is equal to BA

I have a file with four columns like

Code:
dmn10003t1 PF00001 PF00022 dmn12390t1
dmn10008t1 PF00069 PF00027 dmn9781t1
dmn10008t1 PF00068 PF00027 dmn9781t1
dmn10008t1 PF00069 PF00069 dmn9781t1
dmn12390t1 PF00069 PF00076 dmn10003t1

I want to create a new file by comparing the repeated word pairs in column 1 and column 4. The final file will look like

Code:
dmn10003t1 PF00001 PF00022 dmn12390t1
dmn10008t1 PF00069 PF00027 dmn9781t1

There is no specific role of column 2 and column 3 here. In the query file in line 1 pair is dmn10003t1 dmn12390t1 and in line 5 the pair is dmn12390t1 dmn10003t1 which is equivalent as my condition is keep only one occurance if AB is equal to BA, means in such case only one one occurance should be in output file as here will be only line 1 not the line 5.


awk '!seen[$1,$4]++' file does not worked here.
# 2  
Old 10-27-2014
Code:
$ awk '!(SEEN[$1,$4]++) && !(SEEN[$4,$1])' <<EOF
dmn10003t1 PF00001 PF00022 dmn12390t1
dmn10008t1 PF00069 PF00027 dmn9781t1
dmn10008t1 PF00068 PF00027 dmn9781t1
dmn10008t1 PF00069 PF00069 dmn9781t1
dmn12390t1 PF00069 PF00076 dmn10003t1
EOF

dmn10003t1 PF00001 PF00022 dmn12390t1
dmn10008t1 PF00069 PF00027 dmn9781t1

$

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 10-27-2014
Small modification to Corona's solution

Code:
awk '!(SEEN[$1,$4]++) && !(($4,$1) in SEEN)'  infile

This User Gave Thanks to Akshay Hegde For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Remove from a file all lines with value equal to 0

Dear All, I have a file containing 1134 columns and 20825 rows, tabulated as follow -4000 -3900 -3800 -3700 -3600 -3500 -3400 NR_033530 0 0 0 0 0 0 0 NM_001162375 0 0 0 0 0 0 0 NM_007669 0 0 0 0 0 0 328,98 NM_008104 0 388,94 388,94 388,94 0 0 0 NM_010472 0 0 0 0 0 0 0... (7 Replies)
Discussion started by: paolo.kunder
7 Replies

3. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

4. UNIX for Dummies Questions & Answers

Remove Duplicate Two Line Pairs?

So I have a bunch of files that look like this >gi|33332323 MMKCRGVIMVVEKVMKRDGRIVPFDESRIRWAVQ--- >gi|45235353 MMKCR----VEKMRDVFFDESIRWAVQ They go on...sequences are much longer but all in two line (fasta) format. I want to remove duplicate pairs of ID(GI) number and sequence. I tried... (12 Replies)
Discussion started by: bakere19
12 Replies

5. Shell Programming and Scripting

remove duplicate lines with condition

hi to all Does anyone know if there's a way to remove duplicate lines which we consider the same only if they have the first and the second column the same? For example I have : us2333 bbb 5 us2333 bbb 3 us2333 bbb 2 and I want to get us2333 bbb 10 The thing is I cannot... (2 Replies)
Discussion started by: vlm
2 Replies

6. Shell Programming and Scripting

Remove lines with duplicate first field

Trying to cut down the size of some log files. Now that I write this out it looks more dificult than i thought it would be. Need a bash script or command that goes sequentially through all lines of a file, and does this: if field1 (space separated) is the number 2012 print the entire line. Do... (7 Replies)
Discussion started by: ajp7701
7 Replies

7. Shell Programming and Scripting

Need to remove the duplicate lines from a log!!

Hello Folks, Can some one help me with the removal of duplicate lines from a log file and send it to another log file. It's bit complicated as two lines are same but only difference is the timestamp, but some lines are uniq. Line has been seperated by colon's. Log file:... (5 Replies)
Discussion started by: sim_je
5 Replies

8. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies
Login or Register to Ask a Question