Eliminating duplicate lines

05-17-2013

Registered User

91, 0

Join Date: Feb 2011

Last Activity: 25 March 2020, 11:13 PM EDT

Posts: 91

Thanks Given: 35

Thanked 0 Times in 0 Posts

Eliminating duplicate lines

Hello,
I am trying to eliminate lines where column #1 is duplicated. If it is a duplicate, the line with the greater value in column #2 should be deleted:

file.dat

Code:

Desired ouput

Code:

Thanks!

palex

View Public Profile for palex

Find all posts by palex

05-17-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Here is an awk approach:

Code:

awk '
        {
                if ( A[$1] > $2 || !(A[$1]) )
                        A[$1] = $2
        }
        END {
                for ( k in A )
                        print k, A[k]
        }
' file

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

05-18-2013

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

@Yoda:

Code:

( A[$1] > $2 || !(A[$1]) )

must become

Code:

( !($1 in A) || A[$1] > $2 )

in order to also cover negative numbers.
Alone a reading of A[$1] sets an undefined element to 0.

This User Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

05-18-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Another approach:

Code:

sort -k1,1 -k2,2n file | awk '!A[$1]++'

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

05-20-2013

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

perl

Code:

my $pre;
while(<DATA>){
	chomp;
	my @arr = split(" ",$_);
	if(exists $hash{$arr[0]} && $hash{$arr[0]}->{'VAL'}>$arr[1]){
		$hash{$arr[0]}->{'VAL'} = $arr[1];
		$hash{$arr[0]}->{'CNT'} = $.;
	}
	elsif(not exists $hash{$arr[0]}){
		$hash{$arr[0]}->{'VAL'} = $arr[1];
		$hash{$arr[0]}->{'CNT'} = $.;
	}
}
for my $key (sort {$hash{$a}->{'CNT'} <=> $hash{$b}->{'CNT'} } keys %hash){
	print $key," ",$hash{$key}->{'VAL'},"\n";
}
__DATA__
123 45.34
345 67.22
949 36.55
123 94.23
888 22.33
345 32.56

or python

Code:

dic={}
cnt=0
with open("a.txt") as f:
 for line in f:
  cnt+=1
  line=line.replace("\n","")
  words = line.split(" ")
  key=str(words[0])
  if key in dic:
    if dic[key]['VAL']>words[1]:
      dic[key]['VAL']=words[1]
      dic[key]['CNT']=cnt
  else:
    dic[key]={'VAL':words[1],'CNT':cnt}
for i in sorted(dic.keys(),key=lambda x:dic[x]['CNT']):
 print(i,dic[i]['VAL'])

Last edited by summer_cherry; 05-20-2013 at 04:40 AM..

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

Shell Programming and Scripting

Eliminating duplicate lines

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Discussion started by: nalu

2. Shell Programming and Scripting

Duplicate lines

Discussion started by: sxiong

3. Shell Programming and Scripting

Eliminating duplicate lines via specified number of digits

Discussion started by: palex

4. UNIX for Dummies Questions & Answers

Duplicate lines in a file

Discussion started by: nsuresh316

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

6. Shell Programming and Scripting

Print duplicate lines

Discussion started by: locoroco

7. Shell Programming and Scripting

Duplicate lines in a file

Discussion started by: faiz1985

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Discussion started by: dr_sabz

9. UNIX for Dummies Questions & Answers

Eliminating CR (new lines) from a file.

Discussion started by: KornFire

10. Shell Programming and Scripting

Duplicate Lines x 4

Discussion started by: serm