Removing rows that contain non-unique column entry

11-06-2014

Registered User

1, 0

Join Date: Nov 2014

Last Activity: 6 November 2014, 10:41 PM EST

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

Removing rows that contain non-unique column entry

Background:
I have a file of thousands of potential SSR primers from Batch Primer 3.
I can't use primers that will contain the same sequence ID or sequence as another primer.
I have some basic shell scripting skills, but not enough to handle this.

What you need to know:
I need to remove the entire line(row) if its entry in column 3 or 13 is not unique when compared to the rest of its column. Or, I need to cat all lines that have a unique entry in columns 3 and 13 to a new file.

Note: I can't just remove the duplicate value, I have to remove the whole row after checking a value in that row against the rest of its column.

Example data is attached. Red values are duplicates.

Thank you very very very much!

msatseqs

View Public Profile for msatseqs

Find all posts by msatseqs

11-07-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try something like:

Code:

awk -F, 'NR==FNR{A[$3]++; B[$13]++; next} A[$3]==1 && B[$13]==1' infile infile

Note: the input file is specified twice. I have used a comma separator here as field separator, so for that to work you need to use a comma as field separator when you export the spreadsheet.

It is untested since there is no text sample... Please post one.

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Discussion started by: kshitij

2. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Discussion started by: angshuman

3. Shell Programming and Scripting

Unique extraction of rows

Discussion started by: Kanja

4. Shell Programming and Scripting

Delete unique rows - optimize script

Discussion started by: varu0612

5. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

Discussion started by: A-V

6. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Discussion started by: A-V

7. Shell Programming and Scripting

unique entry add values

Discussion started by: Diya123

8. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Discussion started by: Vavad

9. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Discussion started by: Diya123

10. UNIX for Dummies Questions & Answers

Removing the rest of line from the second entry of an expression

Discussion started by: roussine