Removing duplicates based on key


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Removing duplicates based on key
# 1  
Old 08-15-2011
Removing duplicates based on key

Hi,

I have the input file with the below data:
Code:
12345|12|34
12345|13|23
3456|12|90
15670|12|13
12345|10|14
3456|12|13

I need to remove the duplicates based on the first field only.
I need the output like:
Code:
12345|12|34
3456|12|90
15670|12|13

The first field needs to be unique .

How to achieve this?
i can't use sort -u since i need to remove duplicate based only on the first field.
Thanks
# 2  
Old 08-15-2011
Code:
$ sort -t"|" -u -k1,1 test                                                                                                                         
12345|10|14
15670|12|13
3456|12|13

This User Gave Thanks to itkamaraj For This Post:
# 3  
Old 08-15-2011
Stable (the first lines are printed):
Code:
awk -F'|' '!($1 in a) {a[$1]++; print}' INPUTFILE

# 4  
Old 08-15-2011
Quote:
Originally Posted by yazu
Stable (the first lines are printed):
Code:
awk -F'|' '!($1 in a) {a[$1]++; print}' INPUTFILE

Distilling until nothing but the essence remains:
Code:
awk -F\| '!a[$1]++' INPUTFILE

Smilie

Regards,
Alister
These 2 Users Gave Thanks to alister For This Post:
# 5  
Old 08-16-2011
Thanks for your replies.

At the same time, i would like to keep the records which have more than one entry for the first field in a separate fiel before removing duplictaes. how to achieve this?
Th file should contains:

Code:
12345|12|34
12345|13|23
12345|10|14
3456|12|90
3456|12|90

thanks

---------- Post updated at 09:44 AM ---------- Previous update was at 08:51 AM ----------

I guess,

sort filename | uiniq -u will give the entries which are only unique and

sort filename | uniq -d will give only the duplictaes,

correct me, if i am wrong.

thanks

But i want to find the duplicates particularly in the first field.
How we can modify these commands according to that>?
Code:
 
sort filename | uniq -u
sort filename | uniq -d

---------- Post updated at 11:18 AM ---------- Previous update was at 09:44 AM ----------

For the below input:

Code:
12345|12|34
12345|13|23
3456|12|90
15670|12|13
12345|10|14
3456|12|13

I need the below result in one fileSmilieonly unique records)
Code:
  
15670|12|13

In another file, i need below data:

Code:
12345|12|34
12345|13|23
12345|10|14
3456|12|90
3456|12|13

Please help me,. thanks

Last edited by pandeesh; 08-16-2011 at 02:35 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

2. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ... (5 Replies)
Discussion started by: saj
5 Replies

3. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

4. Emergency UNIX and Linux Support

Removing all the duplicates

i want to remove all the duplictaes in a file.I dont want even a single entry. For the input data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 i need the below data in one file 15670|12|13 and the below data in another file (9 Replies)
Discussion started by: pandeesh
9 Replies

5. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

6. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k... (7 Replies)
Discussion started by: kmsekhar
7 Replies

7. Shell Programming and Scripting

Removing duplicates

Hi, I have a file in the below format., test test (10) to to (25) see see (45) and i need the output in the format of test 10 to 25 see 45 Some one help me? (6 Replies)
Discussion started by: imdadulla
6 Replies

8. Shell Programming and Scripting

removing duplicates

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here ) I'm trying to write a scrip that will remove duplicate entries the script can... (5 Replies)
Discussion started by: stevie_velvet
5 Replies

9. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars (1 Reply)
Discussion started by: pukars4u
1 Replies

10. Shell Programming and Scripting

Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working. I've search the forum but nothing comes close. I have a sample file: 27147140631203RA CCD * 27147140631203RA PPN * 37147140631207RD AAA 47147140631203RD JNA... (12 Replies)
Discussion started by: giannicello
12 Replies
Login or Register to Ask a Question