Removing duplicate records in a file based on single column

08-20-2011

Registered User

47, 0

Join Date: Sep 2008

Last Activity: 23 February 2018, 5:00 PM EST

Posts: 47

Thanks Given: 1

Thanked 0 Times in 0 Posts

Removing duplicate records in a file based on single column

Hi,

I want to remove duplicate records including the first line based on column1. For example

inputfile(filer.txt):
-------------

Code:

1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888

expected output:
----------------

Code:

3,60000,4000
4,7000,7777
5,999,8888

Is it possible to achieve this using awk command ??

I tried below awk command , it is working but i dont want to give two times file name(filer.txt) in the command. I am allowed to give file name only one time.

Code:

awk -F"," 'NR == FNR {  cnt[$1] ++} NR != FNR {  if (cnt[$1] == 1) print $0 }' filer.txt filer.txt

Please suggest me how to achieve this.

Thanks in advance

Last edited by Franklin52; 08-22-2011 at 03:57 AM.. Reason: Please use code tags for code and data samples, thank you

G.K.K

View Public Profile for G.K.K

Find all posts by G.K.K

08-20-2011

Registered User

2,202, 340

Join Date: Apr 2007

Last Activity: 10 May 2020, 8:59 AM EDT

Location: 44.21.48N 80.50.15W

Posts: 2,202

Thanks Given: 3

Thanked 340 Times in 306 Posts

Use the unique option of the sort command.
Sort the file using the unique option. Then use diff between the original and the output (of the sort) file. Then use the diff file to remove the records from the output file of the sort.

jgt

View Public Profile for jgt

Visit jgt's homepage!

Find all posts by jgt

08-20-2011

Registered User

47, 0

Join Date: Sep 2008

Last Activity: 23 February 2018, 5:00 PM EST

Posts: 47

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks for reply jgt

, i am allowed to use awk/sed command alone

. can someone give suggestion how exactly i can code it in single command line.

Quote:

Originally Posted by jgt

G.K.K

View Public Profile for G.K.K

Find all posts by G.K.K

08-20-2011

Registered User

2,202, 340

Join Date: Apr 2007

Last Activity: 10 May 2020, 8:59 AM EDT

Location: 44.21.48N 80.50.15W

Posts: 2,202

Thanks Given: 3

Thanked 340 Times in 306 Posts

Quote:

Originally Posted by G.K.K

i am allowed to use awk/sed command alone Smilie

. can someone give suggestion how exactly i can code it in single command line.

Who makes up these rules, and why????

jgt

View Public Profile for jgt

Visit jgt's homepage!

Find all posts by jgt

08-20-2011

Registered User

47, 0

Join Date: Sep 2008

Last Activity: 23 February 2018, 5:00 PM EST

Posts: 47

Thanks Given: 1

Thanked 0 Times in 0 Posts

Got solution using single line command. Thanks. Problem resolved

Quote:

Originally Posted by jgt

Who makes up these rules, and why????

G.K.K

View Public Profile for G.K.K

Find all posts by G.K.K

08-20-2011

Registered User

628, 174

Join Date: Oct 2010

Last Activity: 2 December 2017, 5:58 AM EST

Location: Madrid, Spain

Posts: 628

Thanks Given: 8

Thanked 174 Times in 171 Posts

Hi,

One solution using 'sed':

Code:

$ cat infile
1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888
$ sed -ne '$! { /\n/! N; } ; :a ; $! { /^\([0-9]*\),.*\n\1[^\n]\+$/ { N; ba; }; } ; s/^\([0-9]*\),.*\n\1// ; tb ; P ; D ; :b ; D' infile
3,60000,4000
4,7000,7777
5,999,8888

Regards,
Birei

birei

View Public Profile for birei

Find all posts by birei

Shell Programming and Scripting

Removing duplicate records in a file based on single column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Discussion started by: as7951

2. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

Discussion started by: as7951

3. Shell Programming and Scripting

Removing duplicate lines on first column based with pipe delimiter

Discussion started by: parithi06

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

Discussion started by: cokedude

5. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Discussion started by: informaticist

6. Linux

Need awk script for removing duplicate records

Discussion started by: Rastamed

7. Shell Programming and Scripting

duplicate row based on single column

Discussion started by: mitr

8. Shell Programming and Scripting

Removing duplicate records from 2 files

Discussion started by: zooby

9. Shell Programming and Scripting

Find Duplicate records in first Column in File

Discussion started by: Murugesh

10. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Discussion started by: risk_sly