Removing duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicates
# 1  
Old 09-13-2005
Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working.
I've search the forum but nothing comes close.

I have a sample file:

27147140631203RA CCD *
27147140631203RA PPN *
37147140631207RD AAA
47147140631203RD JNA
47147140631204DC ADK *
47147140631204DC ALK *
67147140631203DA ALM *
67147140631203DA CCD *
77147140631209QC RRP
87147140631203QA RRN

There are 3 spaces between first set of alphanumerics and the last three letter codes.

I want to remove lines that match only up to the 3 blanks and ignore the 3 letter codes or whatever else is on that line after the 3 letter codes.

Anyone know how I can do this? I want to keep at least one instance of any duplicates...doesn't matter which.
I put asteriks where I need to keep one of any two.

Thanks.
Gianni
# 2  
Old 09-13-2005
assuming the first field is always 16 chars you can:

uniq -w16
# 3  
Old 09-13-2005
I tried different combinations of sort and uniq, etc but none worked.
Also, I am on AIX and korn shell. When I ran uniq -?, I got:

uniq: Not a recognized flag: ?
Usage: uniq [-c | -d | -u] [-f Fields] [-s Chars] [-Fields] [+Chars] [InFile [OutFile]]

I have no -w switch...

Thanks.
# 4  
Old 09-13-2005
right so your uniq can only skip fields or chars.
How about swaping the fields using sed like:

sed 's/\([^ ]*\) *\(.*\)$/\2 \1/' |
uniq -f1
sed 's/\([^ ]*\) *\(.*\)$/\2 \1/'
# 5  
Old 09-13-2005
Try:
sort -mu -k1,1 < datafile
# 6  
Old 09-13-2005
Code:
awk '!($1 in a);{a[$1]}' infile

# 7  
Old 09-14-2005
Or an even more cryptic version:
Code:
awk '!x[$1]++' filename > newfile

All this does is create an associative array. The first time it encounters the array element it will be zero, so it will print the whole record. If the element is not zero we have seen it before, so do not print it. $1 is the first field in the record.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

2. Shell Programming and Scripting

Removing duplicates except the last occurrence

Hi All, i have a file like below, @DB_FCTS\src\Data\Scripts\Delete_CU_OM_BIL_PRT_STMT_TYP.sql @DB_FCTS\src\Data\Scripts\Delete_CDP_BILL_LBL_MSG.sql @DB_FCTS\src\Data\Scripts\Delete_OM_BIDDR.sql @DB_FCTS\src\Data\Scripts\Insert_CU_OM_LBL_MSG.sql... (11 Replies)
Discussion started by: mechvijays
11 Replies

3. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

4. Shell Programming and Scripting

Help in removing duplicates

I have an input file abc.txt with info like: abcd rateuse inklite robet rateuse abcd I need to remove duplicates from the file (eg: abcd,rateuse) from the file and need to place the contents in same file abc.txt if needed can be placed in another file. can anyone help me in this :( (4 Replies)
Discussion started by: rkrish
4 Replies

5. Emergency UNIX and Linux Support

Removing all the duplicates

i want to remove all the duplictaes in a file.I dont want even a single entry. For the input data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 i need the below data in one file 15670|12|13 and the below data in another file (9 Replies)
Discussion started by: pandeesh
9 Replies

6. Shell Programming and Scripting

Removing duplicates

I have a test file with the following 2 columns: Col 1 | Col 2 T1 | 1 <= remove T5 | 1 T4 | 2 T1 | 3 T3 | 3 T4 | 1 <= remove T1 | 2 <= remove T3 ... (7 Replies)
Discussion started by: gctex
7 Replies

7. UNIX for Advanced & Expert Users

removing duplicates.

Hi All In unix ,we have a file ,there we have to remove the duplicates by using one specific column. Can any body tell me the command. ex: file1 id,name 1,ww 2,qwq 2,asas 3,asa 4,asas 4,asas o/p: 1,ww 2,qwq 3,asa (7 Replies)
Discussion started by: raju4u
7 Replies

8. Shell Programming and Scripting

Removing duplicates

Hi, I have a file in the below format., test test (10) to to (25) see see (45) and i need the output in the format of test 10 to 25 see 45 Some one help me? (6 Replies)
Discussion started by: imdadulla
6 Replies

9. Shell Programming and Scripting

removing duplicates

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here ) I'm trying to write a scrip that will remove duplicate entries the script can... (5 Replies)
Discussion started by: stevie_velvet
5 Replies

10. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies
Login or Register to Ask a Question