Select unique names while removing the duplicates from a column


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Select unique names while removing the duplicates from a column
# 1  
Select unique names while removing the duplicates from a column

HI,
I have a file with 2 columns:
Code:
ENSG00000003137,ENST00000001146
ENSG00000003137,ENST00000412253
ENSG00000003402,ENST00000309955
ENSG00000003402,ENST00000443227
ENSG00000003402,ENST00000341222

and I want to retain only the first entry while ignoring the rest. The output should look like this:
Code:
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955

I have tried using awk : awk '!a[$1$2]++' but it does not work.
Kindly help.

Moderator's Comments:
Mod Comment Please use codetags

Last edited by Akshay Hegde; 02-18-2020 at 12:00 AM..
# 2  
Of course with -F flag:
Code:
awk -F, '!a[$1]++' file

These 2 Users Gave Thanks to balajesuri For This Post:
# 3  
I think you need to specify a field separator as a comma.

Code:
Owner@Owner-PC ~
$ awk -F, '!a[$1]++' filename
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955


Owner@Owner-PC ~
$ awk  '!a[$1]++' filename
ENSG00000003137,ENST00000001146
ENSG00000003137,ENST00000412253
ENSG00000003402,ENST00000309955
ENSG00000003402,ENST00000443227
ENSG00000003402,ENST00000341222

I used the sample data
These 2 Users Gave Thanks to jim mcnamara For This Post:
# 4  
Code:
$ sort -t"," -k1,1 -u file
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955

# 5  
Code:
awk -F, 'a[$1]++==0' filename

is quick and dirty because it stores an unnecessary integer value.
The full and efficient code is
Code:
awk -F, '!($1 in a) { a[$1]; print }' filename

That you can condense again to an implicit print
Code:
awk -F, '!(($1 in a) || a[$1])' filename

or
Code:
awk -F, '!($1 in a) && !a[$1]' filename

This User Gave Thanks to MadeInGermany For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #14
Difficulty: Easy
The F9 function key needs to be pressed during a Windows reboot to enter into Safe Mode.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicates and unique segregation

Hi All, I have multiple files and i need to segregate unique and duplicates into files. Eg: /source/ -- path abc_12092016.csv abc_11092016.csv abc_12092016.csv ID,NAME,NUMBER 1,XYZ,1234 2,SDF,3456 1,XYZ,1234 abc_11092016.csv 4,RTY,7890 6,WER,5678 8,YUI,0987 6,WER,5678 in the... (1 Reply)
Discussion started by: ajayr1982
1 Replies

2. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies

3. Shell Programming and Scripting

Removing duplicates on a single "column" (delimited file)

Hello ! I'm quite new to linux but haven't found a script to do this task, unfortunately my knowledge is quite limited on shellscripts... Could you guys help me removing the duplicate lines of a file, based only on a single "column"? For example: M202034357;01/2008;J30RJ021;Ciclo 01... (4 Replies)
Discussion started by: Rufinofr
4 Replies

4. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

5. Programming

Query to SELECT only Column Names that Contain a Specific String?

Hey Guys, I'm using SQuirreL SQL v3.5 GUI to fetch some data that I need for something I'm working on. I'm also using the IBM Informix Driver (*Version 3.5) to connect to the Database. What I want to do, if it's even possible, is to show all COLUMNS if they contain the word "Email". So in... (2 Replies)
Discussion started by: mrm5102
2 Replies

6. Shell Programming and Scripting

Print unique names in a specific column using awk

Is it possible to modify file like this. 1. Remove all the duplicate names in a define column i.e 4th col 2. Count the no.of unique names separated by ";" and print as a 5th col thanx in advance!! Q input c1 30 3 Eh2 c10 96 3 Frp c41 396 3 Ua5;Lop;Kol;Kol c62 2 30 Fmp;Fmp;Fmp ... (5 Replies)
Discussion started by: quincyjones
5 Replies

7. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

8. Shell Programming and Scripting

Select Unique Value

HOW CAN I SELECT AN UNIQUE STRING FROM A FIELD? ACTUALLY I WANT TO PRINT RECORDS THAT 2ND FIELD OF THAT HAVE ONE CHARACTER AND IT MUST BE "P" AWK '$2~"" {PRINT $0}' IN > OUTBUT THIS CODE PRINT ALL RECORDS WHICH 2ND FIELDS OF THEM START WITH "P" AND MAY CONTAINS ANOTHER CHARACTER! (1 Reply)
Discussion started by: saeed.soltani
1 Replies

9. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

10. Shell Programming and Scripting

select unique values from duplicates in linux

I have values in the variable as so the for loop just fetches one by one params=$'$HEW_SRC_DATABASE_LIB\nprmAttunityUser\nprmAttunityPwd\nprmODBCDataSource\nprmLoadInd\nprmSrc_Lib_ATM\nprmODBCDataSource_ATM' and i have a grep command like this ret=`grep \$y $pf` ... (0 Replies)
Discussion started by: vee_789
0 Replies

Featured Tech Videos