Remove duplicate nodes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate nodes
# 1  
Old 02-12-2013
Remove duplicate nodes

Hi all,

I have a list of node pairs separated with a comma and also, associated with their respective values. For example:
Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2
b1224,b0015    1.1
b1086,b0528    1.7

Here, b0015,b1224 and b1224,b0015 should be considered as same/duplicates (similarly b0528,b1086 and b1086,b0528) and any one of them needs to be removed from the list. So the desired output would be:
Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2

Any help would be highly appreciated.

Thanks in advance.
# 2  
Old 02-12-2013
Try:
Code:
awk '{  split($1, f, /,/)
        if((f[2]","f[1]) in o) next
        o[$1]
        print
}' input

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.

Note that this won't skip an input line if the 1st field contains the same two nodes in the same order; it will just skip the line if the 1st field contains the same two nodes in reverse order. This script will also skip lines even if the second field contains a different value than the previously printed entry. If this isn't what you want, you need to give more complete requirements.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 02-12-2013
This works for me (using gawk):
Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

These 5 Users Gave Thanks to user8 For This Post:
# 4  
Old 02-12-2013
Thanks for the help but, although it successfully removes the duplicates in column 1, it does not print last (value) column along.
# 5  
Old 02-12-2013
Broken awk? See the hints posted by Don Cragun.
# 6  
Old 02-12-2013
Quote:
Originally Posted by user8
This works for me (using gawk):
Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

This should work with any recent awk (/usr/xpg4/bin/awk or nawk on Solaris systems); it doesn't use any non-stamdard gawk extensions.

Unlike the script I gave, this won't print any duplicated nodes when the nodes.
# 7  
Old 02-12-2013
Perhaps there are tabs present in the input file?
Code:
awk -F'[, \t]' ...


Last edited by Scrutinizer; 02-12-2013 at 02:45 PM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove duplicate

Hi, How can I replace || with space and then remove duplicate from following text? T111||T222||T444||T222||T555 Thanks in advance (10 Replies)
Discussion started by: tinku981
10 Replies

2. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Hi , Some time i got duplicated value in my files , bundle_identifier= B Sometext=ABC bundle_identifier= A bundle_unit=500 Sometext123=ABCD bundle_unit=400 i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies

3. Shell Programming and Scripting

Remove duplicate

Hi , I have a pipe seperated file repo.psv where i need to remove duplicates based on the 1st column only. Can anyone help with a Unix script ? Input: 15277105||Common Stick|ESHR||Common Stock|CYRO AB 15277105||Common Stick|ESHR||Common Stock|CYRO AB 16111278||Common Stick|ESHR||Common... (12 Replies)
Discussion started by: samrat dutta
12 Replies

4. Shell Programming and Scripting

Remove subsequent duplicate only

Hi, I've been trying to dig myself out of this, but nothing has worked out yet. I have an input like this: 1-Num1 1-Num2 2-Num3 3-Num4 1-Num5 3-Num11 2-Num11 1-Num13 1-Num16 3-Num18 4-Num19 2-Num20 1-Num22 3-Num23 (11 Replies)
Discussion started by: jamie_123
11 Replies

5. Shell Programming and Scripting

How to remove duplicate ID's?

HI I have file contains 1000'f of duplicate id's with (upper and lower first character) as below i/p: a411532A411532a508661A508661c411532C411532 Requirement: But i need to ignore lowercase id's and need only below id's o/p: A411532 A508661 C411532 (9 Replies)
Discussion started by: buzzme
9 Replies

6. Shell Programming and Scripting

remove duplicate

Hi, I am tryung to use shell or perl to remove duplicate characters for example , if I have " I love google" it will become I love ggle" or even "I loveggle" if removing duplicate white space Thanks CC (6 Replies)
Discussion started by: ccp
6 Replies

7. Shell Programming and Scripting

Remove duplicate

Hi all, I have a text file fileA.txt DXRV|02/28/2006 11:36:49.049|SAC||||CDxAcct=2420991350 DXRV|02/28/2006 11:37:06.404|SAC||||CDxAcct=6070970034 DXRV|02/28/2006 11:37:25.740|SAC||||CDxAcct=2420991350 DXRV|02/28/2006 11:38:32.633|SAC||||CDxAcct=6070970034 DXRV|02/28/2006... (2 Replies)
Discussion started by: sabercats
2 Replies

8. Shell Programming and Scripting

Remove duplicate ???

Hi all, I have a out.log file CARR|02/26/2006 10:58:30.107|CDxAcct=1405157051 CARR|02/26/2006 11:11:30.107|CDxAcct=1405157051 CARR|02/26/2006 11:18:30.107|CDxAcct=7659579782 CARR|02/26/2006 11:28:30.107|CDxAcct=9534922327 CARR|02/26/2006 11:38:30.107|CDxAcct=9534922327 CARR|02/26/2006... (3 Replies)
Discussion started by: sabercats
3 Replies

9. Shell Programming and Scripting

remove duplicate

i have a text its contain many record, but its written in one line, i want to remove from that line the duplicate record, not record have fixed width ex: width = 4 inputfile test.txt =abc cdf abc abc cdf fgh fgh abc abc i want the outputfile =abc cdf fgh only those records can any one help... (4 Replies)
Discussion started by: kazanoova2
4 Replies
Login or Register to Ask a Question