remove duplicates based on single column

05-25-2011

Registered User

139, 1

Join Date: May 2011

Last Activity: 21 May 2014, 3:07 PM EDT

Posts: 139

Thanks Given: 7

Thanked 1 Time in 1 Post

remove duplicates based on single column

Hello,

I am new to shell scripting. I have a huge file with multiple columns for example:

I have 5 columns below.

HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL
HWUSI-EAS000_29:1:108 + chr5 76654650 AATTGGAA C

ADH
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY

I want to remove duplicates based on column 4 (7664650). In the above case it should list me only row1 and row 4

Any help on this is greatly appreciated.

Thanks,

Diya

Diya123

View Public Profile for Diya123

Find all posts by Diya123

05-25-2011

Registered User

2,163, 123

Join Date: Nov 2007

Last Activity: 31 July 2016, 9:42 AM EDT

Location: H3X

Posts: 2,163

Thanks Given: 11

Thanked 123 Times in 116 Posts

Code:

awk '{a[$4]++}!(a[$4]-1)' file

This User Gave Thanks to danmero For This Post:

danmero

View Public Profile for danmero

Find all posts by danmero

05-26-2011

Registered User

139, 1

Join Date: May 2011

Last Activity: 21 May 2014, 3:07 PM EDT

Posts: 139

Thanks Given: 7

Thanked 1 Time in 1 Post

Thank you.. It worked exactly as what i needed.

Diya123

View Public Profile for Diya123

Find all posts by Diya123

05-27-2011

Registered User

2,759, 420

Join Date: Jun 2006

Last Activity: 13 September 2015, 8:58 PM EDT

Posts: 2,759

Thanks Given: 44

Thanked 420 Times in 408 Posts

Code:

awk '!a[$4]++' infile

rdcwayx

View Public Profile for rdcwayx

Find all posts by rdcwayx

05-27-2011

Registered User

131, 18

Join Date: Jan 2010

Last Activity: 2 April 2019, 12:28 PM EDT

Posts: 131

Thanks Given: 64

Thanked 18 Times in 18 Posts

Code:

$ echo "HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL
HWUSI-EAS000_29:1:108 + chr5 76654650 AATTGGAA CADH
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY" | sort -k4,4 -u

HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY

ni2

View Public Profile for ni2

Find all posts by ni2

Shell Programming and Scripting

remove duplicates based on single column

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Discussion started by: corfuitl

2. Shell Programming and Scripting

Trying to remove duplicates based on field and row

Discussion started by: newbie2010

3. Shell Programming and Scripting

Remove duplicates based on a field's value

Discussion started by: anniecarv

4. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Discussion started by: wanderingmind16

5. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Discussion started by: informaticist

6. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

7. Shell Programming and Scripting

Remove duplicates based on the two key columns

Discussion started by: kmsekhar

8. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Discussion started by: Qwerty123

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

Discussion started by: rdhanek