deleteing duplicate lines sing uniq while ignoring a column

06-15-2010

Registered User

6, 0

Join Date: Jun 2010

Last Activity: 23 June 2010, 1:55 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example

Code:

10 chr1 ASF 30
15 chr1 ASF 20
5 chr1 ASF 30
6 chr2 EBC 15
4 chr2 EBC 30
...

I want to know if I can delete duplicate lines while ignoring column 1, so the results will look like (I will of course sort, etc before I use uniq)

Code:

10 chr1 ASF 30
15 chr1 ASF 20
6 chr2 EBC 15
4 chr2 EBC 30
...

the 3rd line deleted since the information in column 2, 3, 4 has a duplicate somewhere else in the file.

I know that there's a command with uniq that lets you ignore the first n characters, but I don't have a set n, so I cannot use that. Thanks.

Last edited by Scott; 06-15-2010 at 06:47 PM.. Reason: Please use code tags

japaneseguitars

View Public Profile for japaneseguitars

Find all posts by japaneseguitars

06-15-2010

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

A long shot...

Code:

awk '!A[substr($0, index($0, " "))]++' file1 
10 chr1 ASF 30
15 chr1 ASF 20
6 chr2 EBC 15
4 chr2 EBC 30

Scott

View Public Profile for Scott

Find all posts by Scott

06-15-2010

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Code:

awk '!A[$2,$3,$4]++' infile

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-15-2010

Registered User

2,205, 181

Join Date: Mar 2006

Last Activity: 8 May 2020, 5:01 AM EDT

Location: Bangalore,India

Posts: 2,205

Thanks Given: 31

Thanked 181 Times in 171 Posts

Code:

sort -t" " -k2 -k3 -k4 -u file

These 2 Users Gave Thanks to anbu23 For This Post:

anbu23

View Public Profile for anbu23

Find all posts by anbu23

06-15-2010

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

I think that could be reduced to:

Code:

sort -uk2 infile

Which would also sort in the 2nd, 3rd and 4th key.
Thanks Anbu, I always suspected the -u option might be in relation to the sort keys only as opposed to the whole line.
I wonder if all sort implementations work this way though.

Last edited by Scrutinizer; 06-15-2010 at 07:15 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-16-2010

Registered User

6, 0

Join Date: Jun 2010

Last Activity: 23 June 2010, 1:55 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

thanks everyone!

japaneseguitars

View Public Profile for japaneseguitars

Find all posts by japaneseguitars

UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

Discussion started by: kaktus

2. Shell Programming and Scripting

Removing duplicate lines on first column based with pipe delimiter

Discussion started by: parithi06

3. Shell Programming and Scripting

Remove duplicate lines after ignoring case and spaces between

Discussion started by: kraljic

4. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

Discussion started by: coppuca

5. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Discussion started by: torchij

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Discussion started by: asjaiswal

7. Shell Programming and Scripting

Replace a column with a value by ignoring the header lines

Discussion started by: aravindj80

8. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Discussion started by: pathunkathunk

9. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Discussion started by: raidzero

10. Shell Programming and Scripting

oneliner:sing SED on a specific column

Discussion started by: chaseeem