Filter first column duplicates

11-05-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi Giuliano, no the input file does not need to be sorted.

The input file is read twice. The first time is when NR==FNR and in that section the number of occurrences of the label in field 1 gets counted and stored in array A. The second time the file gets read, the line gets printed (the default action is {print $0} ) if there was only one occurrence in array A.

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

11-05-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

If you don't insist on the order of lines, try

Code:

awk '!($1 in T) {T[$1]=$0; next} {T[$1]=""} END {for (t in T) if (T[t]) print T[t]}' file
f g h
k g h 
x y z

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-05-2015

Registered User

77, 2

Join Date: Nov 2012

Last Activity: 5 January 2018, 7:15 AM EST

Posts: 77

Thanks Given: 45

Thanked 2 Times in 2 Posts

well thank you RudiC but if I had difficult to understand the command of Scrutinizer...with yours I am in trouble!

giuliangiuseppe

View Public Profile for giuliangiuseppe

Find all posts by giuliangiuseppe

11-05-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Code:

awk '
!($1 in T)      {T[$1]=$0               # on the first occurrence of $1, save the line in T array
                 next                   # stop processing this line
                }
                {T[$1]=""               # any further occurrence, set T element to empty (NOT delete!)
                }
END             {for (t in T) if (T[t]) # iterate through T arr; if a non-empty value found:
                        print T[t]      # print it
                }
' file

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-05-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Single pass, unspecified order variation that does use delete :

Code:

awk '!C[$1]++{A[$1]=$0; next} {delete A[$1]} END{for(i in A) print A[i]}' file

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Filter first column duplicates

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

2. Shell Programming and Scripting

Count and keep duplicates in Column

Discussion started by: pshields1984

3. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Discussion started by: corfuitl

4. Shell Programming and Scripting

Filter on one column and then perform conditional calculations on another column with a Linux script

Discussion started by: Zooma

5. Shell Programming and Scripting

Remove duplicates within row and separate column

Discussion started by: manigrover

6. Shell Programming and Scripting

Request to check:remove duplicates only in first column

Discussion started by: manigrover

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

8. Shell Programming and Scripting

To Filter out duplicates..

Discussion started by: gameboy87

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

Discussion started by: rdhanek

10. Shell Programming and Scripting

duplicates lines with one column different

Discussion started by: dhanamurthy