Eliminating duplicate lines via specified number of digits


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Eliminating duplicate lines via specified number of digits
# 1  
Old 06-20-2013
Eliminating duplicate lines via specified number of digits

Hello,
This is similar to a previous post, where I was trying to eliminate lines where column #1 is duplicated. If it is a duplicate, the line with the greater value in column #2 should be deleted. In this new case, I need to test duplication with the first three digits in column #1 (ignoring the -xx below):


file.dat

Code:
123 45.34
345 67.22
949-xx 36.55
123-xx 94.23
888 22.33
345-xx 32.56

Desired output:

Code:
123 45.34
949 36.55
888 22.33
345 32.56

Thanks!
# 2  
Old 06-20-2013
An awk solution:
Code:
awk -F'.' '
        {
                sub (/-[^ ]*/, X, $1)
                T = $0
                F = $0
                gsub (/^[^ ]* |\..*/, X, T)
                sub (/[ ].*/, X, F)
                if ( F in A )
                {
                        if ( A[F] > T )
                        {
                                A[F] = T
                                R[F] = $0
                        }
                }
                else
                {
                        A[F] = T
                        R[F] = $0
                }
        }
        END {
                for ( k in R  )
                        print R[k]
        }
' OFS='.' file

# 3  
Old 06-21-2013
Try

Code:
awk -F "[- ]" '{A[$1]=A[$1] && A[$1] < $NF ? A[$1] : $NF}
                END{for(i in A){print i,A[i]}}'  file

# 4  
Old 06-21-2013
A Perl solution using 2 hastables :
Code:
#!/usr/bin/perl -w
use strict;

my $cur_dir = $ENV{PWD};
my $filename = "$cur_dir/$ARGV[0]";
my ($record,@fields,$prefix,%peers,$key,%records);

open(FILEIN,"<$filename") or die"open: $!";
while( defined( $record = <FILEIN> ) ) {
  chomp $record;

  @fields=split(/ /,$record);
  $prefix=substr($fields[0],0,3);

  if(! exists( $peers{$prefix} ) || $fields[1] < $peers{$prefix} ) {
    $peers{$prefix} = $fields[1];
    $records{$prefix} = $record;
  }
}
close(FILEIN);

foreach my $key (sort keys(%records)) {
   print "$records{$key}\n";
}

output :
Code:
%./file031.pl file031
123 45.34
345-xx 32.56
888 22.33
949-xx 36.55


Last edited by Fundix; 06-21-2013 at 06:03 AM.. Reason: Missing }
# 5  
Old 06-21-2013
Thanks for the replies... I have gotten all of them to work.

It turns out that the output that Fundix's perl script produces (WITH the -xx in the first column) will actually be of use in my project.

Fundix - I am actually getting this warning ( I changed this line to:
Code:
my $filename = "$cur_dir/zzz";   ):

Use of uninitialized value in substr at ./dup line 13, <FILEIN> line 7.


pamu - I very much like the brevity of your solution. Is there a quick way to modify it to produce this output?:

Code:
123 45.34
345-xx 32.56
888 22.33
949-xx 36.55

Also, will these solutions work if there is more than two records in the first column that match (selecting the lowest value in column $2 out of the three or more rows)?

Thanks again!

Last edited by Scrutinizer; 06-22-2013 at 02:32 AM.. Reason: extra code tags
# 6  
Old 06-22-2013
Try:
Code:
awk '{i=$1+0} !(i in A) || $2<A[i] {A[i]=$2; S[i]=$0} END{for(i in S){print S[i]}}' file

output:
Code:
888 22.33
345-xx 32.56
123 45.34
949-xx 36.55


Last edited by Scrutinizer; 06-22-2013 at 02:58 AM..
# 7  
Old 06-22-2013
Hi Palex,

my native language is french and i hope i've not misunderstood your request.
I'm now at home, testing my solution on an Apple laptop.

The file looks like :
Code:
123 45.34
345 67.22
345 27.01
949-xx 36.55
123-xx 94.23
888 22.33
345-xx 32.56

The program is still :
Code:
#!/usr/bin/perl -w
use strict;

my $cur_dir = $ENV{PWD};
my $filename = "$cur_dir/$ARGV[0]";
my ($record,@fields,$prefix,%peers,$key,%records);

open(FILEIN,"<$filename") or die"open: $!";
while( defined( $record = <FILEIN> ) ) {
  chomp $record;

  @fields=split(/ /,$record);
  $prefix=substr($fields[0],0,3);

  if(! exists( $peers{$prefix} ) || $fields[1] < $peers{$prefix} ) {
    $peers{$prefix} = $fields[1];
    $records{$prefix} = $record;
  }
}
close(FILEIN);

foreach my $key (sort keys(%records)) {
   print "$records{$key}\n";
}

When i execute it, it's working fine even if there are more than 2 lines for 3 beginning digits words :
Code:
$ ./test.pl file
123 45.34
345 27.01
888 22.33
949-xx 36.55

FYI the file is located is the same directory than the program

Tell me if you still encounter issues.

Last edited by Fundix; 06-24-2013 at 04:33 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find number of digits in a word

HI, Can you tell me how to find the number of digits in a word. $cat data.txt +123456ad 87645768 Output should be 6 8 (5 Replies)
Discussion started by: ashwin3086
5 Replies

2. Shell Programming and Scripting

awk changes to cut number of digits

HCPM1ONDB00014800011800000589009211201 L201307022013070228AUD 00000000031. 000965105800000000000000000000000 MOBITV KEYA ... (4 Replies)
Discussion started by: mirwasim
4 Replies

3. Shell Programming and Scripting

Eliminating duplicate lines

Hello, I am trying to eliminate lines where column #1 is duplicated. If it is a duplicate, the line with the greater value in column #2 should be deleted: file.dat 123 45.34 345 67.22 949 36.55 123 94.23 888 22.33 345 32.56 Desired ouput 123 45.34 949 36.55 888 22.33 345 32.56... (4 Replies)
Discussion started by: palex
4 Replies

4. Shell Programming and Scripting

extracting Number variable and the following digits.

HI all, I have output of something like this: crab: ExitCodes Summary >>>>>>>>> 12 Jobs with Wrapper Exit Code : 50117 List of jobs: 1-12 See https:///twiki/something/ for Exit Code meaning crab: ExitCodes Summary >>>>>>>>> 5 Jobs with Wrapper Exit Code : 8001 List of... (20 Replies)
Discussion started by: emily
20 Replies

5. Shell Programming and Scripting

summing the digits of a binary nuMBER

please help me write a perl program to find the difference of 1 and zeros of a 6 digit binary number. eg If input is 111100 expected output +2 if input is 000011 expected output -2 input is 000111 expected output 0 (2 Replies)
Discussion started by: dll_fpga
2 Replies

6. Shell Programming and Scripting

number of digits after decimal

Hi All, I have a file of decimal numbers, cat file1.txt 1.1382666907 1.2603107334 1.6118799297 24.4995857056 494.7632588468 560.7633734425 ..... I want to see the output as only 7 digits after decimal (5 Replies)
Discussion started by: senayasma
5 Replies

7. UNIX for Dummies Questions & Answers

Eliminating CR (new lines) from a file.

Hi all, I made a C++ program in dos (in dev-C++) and uploaded it on Solaris box. On opening that file with 'vim' editor i found that there is some extra new lines after each written code line. I tried to find out is the file is in dos or in unix format, with 'file' command,and i got "<file-name>.h:... (4 Replies)
Discussion started by: KornFire
4 Replies

8. Shell Programming and Scripting

Count number of digits in a word

Hi all Can anybody suggest me, how to get the count of digits in a word I tried WORD=abcd1234 echo $WORD | grep -oE ] | wc -l 4 It works in bash command line, but not in scripts :mad: (12 Replies)
Discussion started by: ./hari.sh
12 Replies

9. UNIX for Dummies Questions & Answers

Counting The Number Of Duplicate Lines In a File

Hello. First time poster here. I have a huge file of IP numbers. I am trying to output only the class b of the IPs and rank them by most common and output the total # of duplicate class b's before the class b. An example is below: 12.107.1.1 12.107.9.54 12.108.3.89 12.109.109.4 12.109.6.3 ... (2 Replies)
Discussion started by: crunchtime
2 Replies

10. UNIX for Advanced & Expert Users

restrain the number of digits of a PID

How is it possible under UNIX to restrain the number of digits of the PID number? For instance, we have a product that generates a PID of 7 digits, and we would like to have only 6 digits maximum instead for the PID. Thank you for your help. (1 Reply)
Discussion started by: mlefebvr
1 Replies
Login or Register to Ask a Question