Remove duplicate nodes | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Remove duplicate nodes

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-12-2013
AshwaniSharma09 AshwaniSharma09 is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 7 April 2014, 5:16 PM EDT
Posts: 25
Thanks: 14
Thanked 0 Times in 0 Posts
Remove duplicate nodes

Hi all,

I have a list of node pairs separated with a comma and also, associated with their respective values. For example:

Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2
b1224,b0015    1.1
b1086,b0528    1.7

Here, b0015,b1224 and b1224,b0015 should be considered as same/duplicates (similarly b0528,b1086 and b1086,b0528) and any one of them needs to be removed from the list. So the desired output would be:

Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2

Any help would be highly appreciated.

Thanks in advance.
Sponsored Links
    #2  
Old 02-12-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 2 September 2014, 2:34 PM EDT
Location: San Jose, CA, USA
Posts: 4,500
Thanks: 177
Thanked 1,511 Times in 1,283 Posts
Try:

Code:
awk '{  split($1, f, /,/)
        if((f[2]","f[1]) in o) next
        o[$1]
        print
}' input

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk .

Note that this won't skip an input line if the 1st field contains the same two nodes in the same order; it will just skip the line if the 1st field contains the same two nodes in reverse order. This script will also skip lines even if the second field contains a different value than the previously printed entry. If this isn't what you want, you need to give more complete requirements.
The Following User Says Thank You to Don Cragun For This Useful Post:
AshwaniSharma09 (02-12-2013)
Sponsored Links
    #3  
Old 02-12-2013
user8 user8 is offline
Registered User
 
Join Date: Feb 2013
Last Activity: 24 April 2013, 10:34 AM EDT
Posts: 36
Thanks: 0
Thanked 19 Times in 14 Posts
This works for me (using gawk):

Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

The Following 5 Users Say Thank You to user8 For This Useful Post:
AshwaniSharma09 (02-12-2013), Don Cragun (02-12-2013), Scrutinizer (02-12-2013), vgersh99 (02-12-2013), Yoda (02-12-2013)
    #4  
Old 02-12-2013
AshwaniSharma09 AshwaniSharma09 is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 7 April 2014, 5:16 PM EDT
Posts: 25
Thanks: 14
Thanked 0 Times in 0 Posts
Thanks for the help but, although it successfully removes the duplicates in column 1, it does not print last (value) column along.
Sponsored Links
    #5  
Old 02-12-2013
user8 user8 is offline
Registered User
 
Join Date: Feb 2013
Last Activity: 24 April 2013, 10:34 AM EDT
Posts: 36
Thanks: 0
Thanked 19 Times in 14 Posts
Broken awk? See the hints posted by Don Cragun.
Sponsored Links
    #6  
Old 02-12-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 2 September 2014, 2:34 PM EDT
Location: San Jose, CA, USA
Posts: 4,500
Thanks: 177
Thanked 1,511 Times in 1,283 Posts
Quote:
Originally Posted by user8 View Post
This works for me (using gawk):

Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

This should work with any recent awk ( /usr/xpg4/bin/awk or nawk on Solaris systems); it doesn't use any non-stamdard gawk extensions.

Unlike the script I gave, this won't print any duplicated nodes when the nodes.
Sponsored Links
    #7  
Old 02-12-2013
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 2 September 2014, 10:11 AM EDT
Location: Amsterdam
Posts: 9,387
Thanks: 273
Thanked 2,349 Times in 2,108 Posts
Perhaps there are tabs present in the input file?

Code:
awk -F'[, \t]' ...


Last edited by Scrutinizer; 02-12-2013 at 01:45 PM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
AshwaniSharma09 (02-14-2013)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to remove duplicate ID's? buzzme Shell Programming and Scripting 9 01-28-2013 12:09 PM
remove duplicate ccp Shell Programming and Scripting 6 11-07-2009 10:50 PM
Remove duplicate sabercats Shell Programming and Scripting 2 03-31-2006 11:35 AM
Remove duplicate ??? sabercats Shell Programming and Scripting 3 03-10-2006 06:06 PM
remove duplicate kazanoova2 Shell Programming and Scripting 4 04-12-2004 12:35 AM



All times are GMT -4. The time now is 03:00 PM.