Removing duplicates

09-13-2005

Registered User

190, 0

Join Date: Sep 2001

Last Activity: 21 August 2015, 10:59 AM EDT

Location: Chicago

Posts: 190

Thanks Given: 7

Thanked 0 Times in 0 Posts

Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working.
I've search the forum but nothing comes close.

I have a sample file:

27147140631203RA CCD *
27147140631203RA PPN *
37147140631207RD AAA
47147140631203RD JNA
47147140631204DC ADK *
47147140631204DC ALK *
67147140631203DA ALM *
67147140631203DA CCD *
77147140631209QC RRP
87147140631203QA RRN

There are 3 spaces between first set of alphanumerics and the last three letter codes.

I want to remove lines that match only up to the 3 blanks and ignore the 3 letter codes or whatever else is on that line after the 3 letter codes.

Anyone know how I can do this? I want to keep at least one instance of any duplicates...doesn't matter which.
I put asteriks where I need to keep one of any two.

Thanks.
Gianni

giannicello

View Public Profile for giannicello

Find all posts by giannicello

09-13-2005

Registered User

61, 0

Join Date: Jun 2005

Last Activity: 12 January 2006, 7:33 AM EST

Location: Ireland

Posts: 61

Thanks Given: 0

Thanked 0 Times in 0 Posts

assuming the first field is always 16 chars you can:

uniq -w16

pixelbeat

View Public Profile for pixelbeat

Find all posts by pixelbeat

09-13-2005

Registered User

190, 0

Join Date: Sep 2001

Last Activity: 21 August 2015, 10:59 AM EDT

Location: Chicago

Posts: 190

Thanks Given: 7

Thanked 0 Times in 0 Posts

I tried different combinations of sort and uniq, etc but none worked.
Also, I am on AIX and korn shell. When I ran uniq -?, I got:

uniq: Not a recognized flag: ?
Usage: uniq [-c | -d | -u] [-f Fields] [-s Chars] [-Fields] [+Chars] [InFile [OutFile]]

I have no -w switch...

Thanks.

giannicello

View Public Profile for giannicello

Find all posts by giannicello

09-13-2005

Registered User

61, 0

Join Date: Jun 2005

Last Activity: 12 January 2006, 7:33 AM EST

Location: Ireland

Posts: 61

Thanks Given: 0

Thanked 0 Times in 0 Posts

right so your uniq can only skip fields or chars.
How about swaping the fields using sed like:

sed 's/$[^ ]*$ *$.*$$/\2 \1/' |
uniq -f1
sed 's/$[^ ]*$ *$.*$$/\2 \1/'

pixelbeat

View Public Profile for pixelbeat

Find all posts by pixelbeat

09-13-2005

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

Try:
sort -mu -k1,1 < datafile

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

09-13-2005

Registered User

137, 2

Join Date: Jul 2005

Last Activity: 18 October 2007, 8:47 PM EDT

Posts: 137

Thanks Given: 0

Thanked 2 Times in 2 Posts

Code:

awk '!($1 in a);{a[$1]}' infile

futurelet

View Public Profile for futurelet

Find all posts by futurelet

09-14-2005

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Or an even more cryptic version:

Code:

awk '!x[$1]++' filename > newfile

All this does is create an associative array. The first time it encounters the array element it will be zero, so it will print the whole record. If the element is not zero we have seen it before, so do not print it. $1 is the first field in the record.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

Shell Programming and Scripting