Filter first column duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter first column duplicates
# 8  
Old 11-05-2015
Hi Giuliano, no the input file does not need to be sorted.

The input file is read twice. The first time is when NR==FNR and in that section the number of occurrences of the label in field 1 gets counted and stored in array A. The second time the file gets read, the line gets printed (the default action is {print $0} ) if there was only one occurrence in array A.
This User Gave Thanks to Scrutinizer For This Post:
# 9  
Old 11-05-2015
If you don't insist on the order of lines, try
Code:
awk '!($1 in T) {T[$1]=$0; next} {T[$1]=""} END {for (t in T) if (T[t]) print T[t]}' file
f g h
k g h 
x y z

This User Gave Thanks to RudiC For This Post:
# 10  
Old 11-05-2015
well thank you RudiC but if I had difficult to understand the command of Scrutinizer...with yours I am in trouble!
# 11  
Old 11-05-2015
Code:
awk '
!($1 in T)      {T[$1]=$0               # on the first occurrence of $1, save the line in T array
                 next                   # stop processing this line
                }
                {T[$1]=""               # any further occurrence, set T element to empty (NOT delete!)
                }
END             {for (t in T) if (T[t]) # iterate through T arr; if a non-empty value found:
                        print T[t]      # print it
                }
' file

# 12  
Old 11-05-2015
Single pass, unspecified order variation that does use delete :
Code:
awk '!C[$1]++{A[$1]=$0; next} {delete A[$1]} END{for(i in A) print A[i]}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Count and keep duplicates in Column

Hi folks, I've got a csv file called test.csv Column A Column B Apples 1900 Apples 1901 Pears 1902 Pears 1903I want to count and keep duplicates in the first column. Desired output Column A Column B Column C Apples 2 1900 Apples ... (5 Replies)
Discussion started by: pshields1984
5 Replies

3. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies

4. Shell Programming and Scripting

Filter on one column and then perform conditional calculations on another column with a Linux script

Hi, I have a file (stats.txt) with columns like in the example below. Destination IP address, timestamp, TCP packet sequence number and packet length. destIP time seqNo packetLength 1.2.3.4 0.01 123 500 1.2.3.5 0.03 44 1500 1.3.2.5 0.08 44 1500 1.2.3.4 0.44... (12 Replies)
Discussion started by: Zooma
12 Replies

5. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

Request to check:remove duplicates only in first column

Hi all, I have an input file like this Now I have to remove duplicates only in first column and nothing has to be changed in second and third column. so that output would be Please let me know scripting regarding this (20 Replies)
Discussion started by: manigrover
20 Replies

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

8. Shell Programming and Scripting

To Filter out duplicates..

I have a text file of this format... 55 55-45345.xml 20070615 55 55-87655.xml 20070613 34 34-56753.xml 20070614 The text file has values like a number,xml file name, and a date.The first column can have n number of duplicates.And no two dates are equal.Now I sorted out the file.So, it is... (1 Reply)
Discussion started by: gameboy87
1 Replies

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

10. Shell Programming and Scripting

duplicates lines with one column different

Hi I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738... (10 Replies)
Discussion started by: dhanamurthy
10 Replies
Login or Register to Ask a Question