Remove duplicate lines based on field and sort


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines based on field and sort
# 1  
Old 03-17-2012
Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method you can think of, but awk would be my preferred tool if possible.

Code:
cut -f 1 -d , sorting.csv | sort | uniq

Code:
55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

# 2  
Old 03-17-2012
Post sample input and desired output.
# 3  
Old 03-17-2012
Code:
sort -t, -nuk1 sorting.csv

# 4  
Old 03-17-2012
Hi Balaji,

Pls mention what is sort -t and -nuk1

Regards,
adirajup
# 5  
Old 03-17-2012
man sort

Quote:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition

-n, --numeric-sort
compare according to string numerical value

-u, --unique
with -c, check for strict ordering; without -c, output only the
first of an equal run

-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of
line). See POS syntax below
So basically, it will sort numerically (-n) on the first field (-k1) which is separated by ,(comma) (-t,) and produce unique (-u) results

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 6  
Old 03-18-2012
using Perl


Code:
#!/usr/bin/perl

use strict;
my %seen=();
my @flds;
while (<DATA>){
chomp;
@flds=split /,/;
print $_,"\n" if !$seen{$flds[0]}++;
}

__DATA__
55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

# 7  
Old 03-18-2012
Quote:
Originally Posted by bartus11
Post sample input and desired output.
Sorry about that. I don't ask these type of questions very often.

Code:
44,I,like,cookies,2,8,9
55,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
77,I,like,cookies,5,7,8
88,I,like,cookies,5,7,8
99,I,like,cookies,5,7,8

Quote:
Originally Posted by balajesuri
Code:
sort -t, -nuk1 sorting.csv

Works perfectly.

Quote:
Originally Posted by pravin27
using Perl


Code:
#!/usr/bin/perl

use strict;
my %seen=();
my @flds;
while (<DATA>){
chomp;
@flds=split /,/;
print $_,"\n" if !$seen{$flds[0]}++;
}

__DATA__
55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

Does perl have a sorting function? I have never used perl before. Is there a way to use this on a file? I have several huge files I need to do this on. I was just trying to keep my example simple when I showed my data above.

Does anyone know a way to do this with awk?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

3. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

4. Shell Programming and Scripting

Remove lines with duplicate first field

Trying to cut down the size of some log files. Now that I write this out it looks more dificult than i thought it would be. Need a bash script or command that goes sequentially through all lines of a file, and does this: if field1 (space separated) is the number 2012 print the entire line. Do... (7 Replies)
Discussion started by: ajp7701
7 Replies

5. UNIX for Dummies Questions & Answers

remove duplicate lines based on two columns and judging from a third one

hello all, I have an input file with four columns like this with a lot of lines and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5.... (5 Replies)
Discussion started by: TheTransporter
5 Replies

6. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

7. Shell Programming and Scripting

Remove duplicate lines (the first matching line by field criteria)

Hello to all, I have this file 2002 1 23 0 0 2435.60 131.70 5.60 20.99 0.89 0.00 285.80 2303.90 2002 1 23 15 0 2436.60 132.90 6.45 21.19 1.03 0.00 285.80 2303.70 2002 1 23 ... (6 Replies)
Discussion started by: joggdial3000
6 Replies

8. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

9. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies

10. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies
Login or Register to Ask a Question