[SOLVED] remove lines that have duplicate values in column two


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers [SOLVED] remove lines that have duplicate values in column two
# 1  
Old 09-26-2012
[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp").

I tried
Code:
sort -t -nuk2,3 file.txt

But got:
sort: multi-character tab `-nuk2,3'

"man sort" did not help me out

Any pointers?

Input:
Quote:
gi|328725975|ref|XP_003248692.1| comp47911_c0_seq1 82.02 367 66 0 1 367 78 1178 0 603
gi|328718720|ref|XP_001946259.2| comp110820_c0_seq1 46.85 111 59 0 422 532 2 334 1.00E-31 120
gi|193617875|ref|XP_001945312.1| comp110820_c0_seq1 45.13 113 62 0 535 647 2 340 7.00E-31 119
gi|328698003|ref|XP_001947254.2| comp1227639_c0_seq1 89.36 141 15 0 3 143 3 425 5.00E-82 247
gi|328725151|ref|XP_001951585.2| comp142443_c0_seq1 53.33 75 32 2 49 122 240 22 1.00E-16 73.2
gi|328725427|ref|XP_001948141.2| comp143768_c0_seq1 89.49 257 25 1 1 257 147 911 3.00E-171 483
gi|328717989|ref|XP_003246356.1| comp143768_c0_seq1 91.42 303 26 0 132 434 3 911 0 587
gi|328712467|ref|XP_001948906.2| comp143768_c0_seq1 69.81 308 87 3 69 375 3 911 1.00E-153 443
gi|328698003|ref|XP_001947254.2| comp143768_c0_seq1 94.12 102 6 0 147 248 3 308 1.00E-62 203
Output:
Quote:
gi|328725975|ref|XP_003248692.1| comp47911_c0_seq1 82.02 367 66 0 1 367 78 1178 0 603
gi|328718720|ref|XP_001946259.2| comp110820_c0_seq1 46.85 111 59 0 422 532 2 334 1.00E-31 120
gi|328698003|ref|XP_001947254.2| comp1227639_c0_seq1 89.36 141 15 0 3 143 3 425 5.00E-82 247
gi|328725151|ref|XP_001951585.2| comp142443_c0_seq1 53.33 75 32 2 49 122 240 22 1.00E-16 73.2
gi|328725427|ref|XP_001948141.2| comp143768_c0_seq1 89.49 257 25 1 1 257 147 911 3.00E-171 483
# 2  
Old 09-26-2012
try this

Code:
awk '!x[$2]++' file


Last edited by pamu; 09-26-2012 at 01:44 PM.. Reason: missed +
# 3  
Old 09-26-2012
I'm not sure sort can reject non-unique lines like that. But you forgot to actually tell sort which character you wanted to sort by after -t there.

Since it appears to be a single space, you don't need -t anyway.

This will use awk to reject duplicates before sorting.
Code:
awk '!($2 in X) { X[$2]++; print }' inputfile | sort -k 2,3

This User Gave Thanks to Corona688 For This Post:
# 4  
Old 09-26-2012
Pamu thanks for taking a look at this, but your suggestion did not work, it did not seem to affect the file.

Corona688, your solution worked. Thanks!
# 5  
Old 09-26-2012
pamu's solution looks like my solution in brief. I think he fixed a typo after you saw it.
# 6  
Old 09-26-2012
Quote:
Originally Posted by Corona688
pamu's solution looks like my solution in brief. I think he fixed a typo after you saw it.
Just missed ++...Smilie

Now corrected...Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

4. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

5. UNIX for Dummies Questions & Answers

Remove duplicate words from column 1

Tried using sed and uniq but it's removing the entire line. Can't seem to figure a way to just remove the word. Any help is appreciated. I have a file: dog, text1, text2, text3 dog, text1, text2, text3 dog, text1, text2, text3 cat, text1, text2, text3 Trying to remove all duplicate instances... (6 Replies)
Discussion started by: jimmyf
6 Replies

6. Shell Programming and Scripting

Remove duplicate values with condition

Hi Gents, Please can you help me to get the desired output . In the first column I have some duplicate records, The condition is that all need to reject the duplicate record keeping the last occurrence. But the condition is. If the last occurrence is equal to value 14 or 98 in column 3 and... (2 Replies)
Discussion started by: jiam912
2 Replies

7. Shell Programming and Scripting

Get the average from column, and eliminate the duplicate values.

Dear Experts, Kindly help me please, I have a big file where there is duplicate values in col 11 till col 23, every 2 rows appers a new numbers, but in each row there is different coordinates x and y in col 57 till col 74. Please i will like to get a single value and average of the x and y... (8 Replies)
Discussion started by: jiam912
8 Replies

8. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

9. UNIX for Dummies Questions & Answers

[Solved] How to extract single and duplicate lines from file?

Hi, I need help! I have two files, one containing a list of codes and the other a list of codes and their meaning. I need to extract from file 2 all the codes from file 1 into a new file. These are my files: File1: Metbo Metbo Memar Mth Metbo File2: Metbo Methanoculleus... (3 Replies)
Discussion started by: Lokaps
3 Replies

10. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies
Login or Register to Ask a Question