Sort and Remove Duplicate on file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sort and Remove Duplicate on file
# 1  
Old 05-12-2011
Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format.

Code:
aaa|1234|2010-12-31
aaa|1234|2010-11-10
bbb|345|2011-01-01
ccc|346|2011-02-01
bbb|345|2011-03-10
aaa|1234|2010-01-01

Required Output
Code:
aaa|1234|2010-12-31
bbb|345|2011-03-10
ccc|346|2011-02-01

I tried using sort -u , but how to retain the record with maximum date.
Thanks
Arif
# 2  
Old 05-12-2011
Code:
awk -F'[|-]' '{a[$1"|"$2]=($3+$4+$5)>t?$3"-"$4"-"$5:a[$1"|"$2];t=$3+$4+$5}END{for(i in a) print i,a[i]|"sort"}'

This User Gave Thanks to yinyuemi For This Post:
# 3  
Old 05-12-2011
-deleted-

Last edited by ctsgnb; 05-12-2011 at 07:23 PM..
This User Gave Thanks to ctsgnb For This Post:
# 4  
Old 05-12-2011
Thanks

IF the file is already sorted on the three columns , can the remove duplicate part be a little simpler , I wanted to use this code in a datastage program , and was wondering if the code can be little simpler so that other datstage developers ( less unix background) can understand .
# 5  
Old 05-12-2011
how about this?
Code:
echo 'aaa|1234|2010-12-31
aaa|1234|2010-11-10
bbb|345|2011-01-01
ccc|346|2011-02-01
bbb|345|2011-03-10
aaa|1234|2010-01-01' |sort -t '|' -k1,1 -k3,3r |awk -F"|" '++a[$1"|"$2]==1'
aaa|1234|2010-12-31
bbb|345|2011-03-10
ccc|346|2011-02-01

# 6  
Old 05-12-2011
Quote:
Originally Posted by yinyuemi
how about this?
Code:
echo 'aaa|1234|2010-12-31
aaa|1234|2010-11-10
bbb|345|2011-01-01
ccc|346|2011-02-01
bbb|345|2011-03-10
aaa|1234|2010-01-01' |sort -t '|' -k1,1 -k3,3r |awk -F"|" '++a[$1"|"$2]==1'
aaa|1234|2010-12-31
bbb|345|2011-03-10
ccc|346|2011-02-01

Code:
sort -t '|' -k3,3r infile  |awk -F \| '!a[$1 FS $2]++'

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. UNIX for Dummies Questions & Answers

Sort and delete partical duplicate file

I want to delete partical duplicate file >gma-miR156d Gm01,PACID=26323927 150.00 -18.28 2 18 17 35 16 75.00% 81.25% >>gma-miR156d Gm01,PACID=26323927 150.00 -18.28 150.00 -18.28 1 21 119 17 I want to order by the second column and delete the... (1 Reply)
Discussion started by: grace_shen
1 Replies

3. Shell Programming and Scripting

How to Remove duplicate value from file?

if different branch code is available for same BIC code and one of the branch code is XXX.only one row will be stored and with branch code as XXX .rest of the rows for the BIC code will not be stored. for example if $7 is BIC code and $8 is branch code INPUT file are following... (9 Replies)
Discussion started by: mohan sharma
9 Replies

4. Shell Programming and Scripting

Remove duplicate chars and sort string [SED]

Hi, INPUT: DCBADD OUTPUT: ABCD The SED script should alphabetically sort the chars in the string and remove the duplicate chars. (5 Replies)
Discussion started by: jds93
5 Replies

5. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

6. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

7. Shell Programming and Scripting

Shellscript to sort duplicate files listed in a text file

I have many pdf's scattered across 4 machines. There is 1 location where I have other Pdf's maintained. But the issues it the 4 machines may have duplicate pdf's among themselves, but I want just 1 copy of each so that they can be transfered to that 1 location. What I have thought is: 1) I have... (11 Replies)
Discussion started by: deaddevil
11 Replies

8. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

9. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies
Login or Register to Ask a Question