Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-12-2013
Registered User
 
Join Date: Nov 2009
Posts: 25
Thanks: 14
Thanked 0 Times in 0 Posts
Remove duplicate nodes

Hi all,

I have a list of node pairs separated with a comma and also, associated with their respective values. For example:

Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2
b1224,b0015    1.1
b1086,b0528    1.7

Here, b0015,b1224 and b1224,b0015 should be considered as same/duplicates (similarly b0528,b1086 and b1086,b0528) and any one of them needs to be removed from the list. So the desired output would be:

Code:
b0015,b1224    1.1
b0015,b2576    1.4
b0015,b3162    2.5
b0528,b1086    1.7
b0528,b1269    5.4
b0528,b3602    2.1
b0948,b2581    3.2

Any help would be highly appreciated.

Thanks in advance.
Sponsored Links
    #2  
Old 02-12-2013
Registered User
 
Join Date: Jul 2012
Location: San Jose, CA
Posts: 1,490
Thanks: 62
Thanked 538 Times in 471 Posts
Try:

Code:
awk '{  split($1, f, /,/)
        if((f[2]","f[1]) in o) next
        o[$1]
        print
}' input

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk .

Note that this won't skip an input line if the 1st field contains the same two nodes in the same order; it will just skip the line if the 1st field contains the same two nodes in reverse order. This script will also skip lines even if the second field contains a different value than the previously printed entry. If this isn't what you want, you need to give more complete requirements.
The Following User Says Thank You to Don Cragun For This Useful Post:
AshwaniSharma09 (02-12-2013)
Sponsored Links
    #3  
Old 02-12-2013
Registered User
 
Join Date: Feb 2013
Posts: 36
Thanks: 0
Thanked 19 Times in 14 Posts
This works for me (using gawk):

Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

The Following 5 Users Say Thank You to user8 For This Useful Post:
AshwaniSharma09 (02-12-2013), Don Cragun (02-12-2013), Scrutinizer (02-12-2013), vgersh99 (02-12-2013), Yoda (02-12-2013)
    #4  
Old 02-12-2013
Registered User
 
Join Date: Nov 2009
Posts: 25
Thanks: 14
Thanked 0 Times in 0 Posts
Thanks for the help but, although it successfully removes the duplicates in column 1, it does not print last (value) column along.
Sponsored Links
    #5  
Old 02-12-2013
Registered User
 
Join Date: Feb 2013
Posts: 36
Thanks: 0
Thanked 19 Times in 14 Posts
Broken awk? See the hints posted by Don Cragun.
Sponsored Links
    #6  
Old 02-12-2013
Registered User
 
Join Date: Jul 2012
Location: San Jose, CA
Posts: 1,490
Thanks: 62
Thanked 538 Times in 471 Posts
Quote:
Originally Posted by user8 View Post
This works for me (using gawk):

Code:
gawk -F',| ' '!(a[$1,$2]++ + a[$2,$1]++)'

This should work with any recent awk ( /usr/xpg4/bin/awk or nawk on Solaris systems); it doesn't use any non-stamdard gawk extensions.

Unlike the script I gave, this won't print any duplicated nodes when the nodes.
Sponsored Links
    #7  
Old 02-12-2013
Scrutinizer's Avatar
Moderator
 
Join Date: Nov 2008
Location: Amsterdam
Posts: 7,353
Thanks: 144
Thanked 1,756 Times in 1,593 Posts
Perhaps there are tabs present in the input file?

Code:
awk -F'[, \t]' ...


Last edited by Scrutinizer; 02-12-2013 at 01:45 PM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
AshwaniSharma09 (02-14-2013)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to remove duplicate ID's? buzzme Shell Programming and Scripting 9 01-28-2013 12:09 PM
remove duplicate ccp Shell Programming and Scripting 6 11-07-2009 10:50 PM
Remove duplicate sabercats Shell Programming and Scripting 2 03-31-2006 11:35 AM
Remove duplicate ??? sabercats Shell Programming and Scripting 3 03-10-2006 06:06 PM
remove duplicate kazanoova2 Shell Programming and Scripting 4 04-12-2004 12:35 AM



All times are GMT -4. The time now is 01:50 AM.