Eliminating duplicate lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Eliminating duplicate lines
# 1  
Old 05-17-2013
Eliminating duplicate lines

Hello,
I am trying to eliminate lines where column #1 is duplicated. If it is a duplicate, the line with the greater value in column #2 should be deleted:

file.dat

Code:
123 45.34
345 67.22
949 36.55
123 94.23
888 22.33
345 32.56

Desired ouput

Code:
123 45.34
949 36.55
888 22.33
345 32.56

Thanks!
# 2  
Old 05-17-2013
Here is an awk approach:
Code:
awk '
        {
                if ( A[$1] > $2 || !(A[$1]) )
                        A[$1] = $2
        }
        END {
                for ( k in A )
                        print k, A[k]
        }
' file

This User Gave Thanks to Yoda For This Post:
# 3  
Old 05-18-2013
@Yoda:
Code:
( A[$1] > $2 || !(A[$1]) )

must become
Code:
( !($1 in A) || A[$1] > $2 )

in order to also cover negative numbers.
Alone a reading of A[$1] sets an undefined element to 0.
This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 05-18-2013
Another approach:
Code:
sort -k1,1 -k2,2n file | awk '!A[$1]++'

# 5  
Old 05-20-2013
perl

Code:
my $pre;
while(<DATA>){
	chomp;
	my @arr = split(" ",$_);
	if(exists $hash{$arr[0]} && $hash{$arr[0]}->{'VAL'}>$arr[1]){
		$hash{$arr[0]}->{'VAL'} = $arr[1];
		$hash{$arr[0]}->{'CNT'} = $.;
	}
	elsif(not exists $hash{$arr[0]}){
		$hash{$arr[0]}->{'VAL'} = $arr[1];
		$hash{$arr[0]}->{'CNT'} = $.;
	}
}
for my $key (sort {$hash{$a}->{'CNT'} <=> $hash{$b}->{'CNT'} } keys %hash){
	print $key," ",$hash{$key}->{'VAL'},"\n";
}
__DATA__
123 45.34
345 67.22
949 36.55
123 94.23
888 22.33
345 32.56


or python

Code:
dic={}
cnt=0
with open("a.txt") as f:
 for line in f:
  cnt+=1
  line=line.replace("\n","")
  words = line.split(" ")
  key=str(words[0])
  if key in dic:
    if dic[key]['VAL']>words[1]:
      dic[key]['VAL']=words[1]
      dic[key]['CNT']=cnt
  else:
    dic[key]={'VAL':words[1],'CNT':cnt}
for i in sorted(dic.keys(),key=lambda x:dic[x]['CNT']):
 print(i,dic[i]['VAL'])


Last edited by summer_cherry; 05-20-2013 at 04:40 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Duplicate lines

Dear All, I have a two-column data file and want to duplicate data in second column w.r.t. first column. My file looks like: 2 5.672 1 3.593 3 8.260 ... And the desired format: 5.672 5.672 3.593 8.260 8.260 8.260 ... How may I do so please? I appreciate any help you may... (2 Replies)
Discussion started by: sxiong
2 Replies

3. Shell Programming and Scripting

Eliminating duplicate lines via specified number of digits

Hello, This is similar to a previous post, where I was trying to eliminate lines where column #1 is duplicated. If it is a duplicate, the line with the greater value in column #2 should be deleted. In this new case, I need to test duplication with the first three digits in column #1 (ignoring the... (6 Replies)
Discussion started by: palex
6 Replies

4. UNIX for Dummies Questions & Answers

Duplicate lines in a file

I have a file with following data A B C I would like to print like this n times(For eg:5 times) A B C A B C A B C A B C A (7 Replies)
Discussion started by: nsuresh316
7 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. Shell Programming and Scripting

Print duplicate lines

I have a file where some of the lines are duplicates. How do I use bash to print all the lines that have duplicates? (2 Replies)
Discussion started by: locoroco
2 Replies

7. Shell Programming and Scripting

Duplicate lines in a file

Hi All, I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has: 00:44,37,67,56,15,12 00:44,34,67,56,15,12 00:44,58,67,56,15,12 00:44,35,67,56,15,12 00:59,37,67,56,15,12 00:59,34,67,56,15,12 00:59,35,67,56,15,12... (7 Replies)
Discussion started by: faiz1985
7 Replies

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies

9. UNIX for Dummies Questions & Answers

Eliminating CR (new lines) from a file.

Hi all, I made a C++ program in dos (in dev-C++) and uploaded it on Solaris box. On opening that file with 'vim' editor i found that there is some extra new lines after each written code line. I tried to find out is the file is in dos or in unix format, with 'file' command,and i got "<file-name>.h:... (4 Replies)
Discussion started by: KornFire
4 Replies

10. Shell Programming and Scripting

Duplicate Lines x 4

Hi Guys and Girls I'm having trouble outputing from a sorted file... i have a looooong list of PVIDs and need to only output only those which occur 4 times!! Any suggestions? ie I need to uniq (but not uniq (i've been through the man pg) this: cat /tmp/disk.out|awk '{print $3}' |grep -v... (6 Replies)
Discussion started by: serm
6 Replies
Login or Register to Ask a Question