Delete rows with unique value for specific column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete rows with unique value for specific column
# 1  
Old 06-10-2012
Question Delete rows with unique value for specific column

Hi all
I have a file which looks like this
Code:
1234|1|Jon|some text|some text
1234|2|Jon|some text|some text
3453|5|Jon|some text|some text
6533|2|Kate|some text|some text
4567|3|Chris|some text|some text
4567|4|Maggie|some text|some text
8764|6|Maggie|some text|some text

My third column is my KEY and I want to only print lines of the file if the KEY has been printed more than once. So basically any unique entry for column three can be deleted.
Code:
So the output would look like this
1234|1|Jon|some text|some text
1234|2|Jon|some text|some text
3453|5|Jon|some text|some text
4567|4|Maggie|some text|some text
8764|6|Maggie|some text|some text

Can you please help me?

Moderator's Comments:
Mod Comment Please use [code]...[/code] tags instead of [quote]...[/quote] tags for code and samples

Last edited by Scrutinizer; 06-11-2012 at 12:59 AM.. Reason: code tags instead of code tags
# 2  
Old 06-11-2012
Try...
Code:
$ cat file1
1234|1|Jon|some text|some text
1234|2|Jon|some text|some text
3453|5|Jon|some text|some text
6533|2|Kate|some text|some text
4567|3|Chris|some text|some text
4567|4|Maggie|some text|some text
8764|6|Maggie|some text|some text

$ awk 'NR==FNR{a[$3]++;next}a[$3]>1' FS='|' file1 file1
1234|1|Jon|some text|some text
1234|2|Jon|some text|some text
3453|5|Jon|some text|some text
4567|4|Maggie|some text|some text
8764|6|Maggie|some text|some text

$

This User Gave Thanks to Ygor For This Post:
# 3  
Old 06-12-2012
fortunately it doesn't do any anything on my file Smilie
so even putting into a file it returns an empty file

oooppsssss ... my mistake Smilie
I only put the file1 once
why we need to put it twice? is it for comparison?

Thanks for your help

---------- Post updated at 08:31 AM ---------- Previous update was at 08:16 AM ----------

i tested it on my documents

somehow, it does not delete all the single lines so I do steel have unique data
on the other hand it deletes one row from the non-unique ones as well so if i have two james on file one, in output i have 1 james only

any suggestion?

Last edited by A-V; 06-12-2012 at 10:23 AM..
# 4  
Old 06-12-2012
awk

Hi,

Try this one,

Code:
awk 'BEGIN{FS="|";}{a[$3]++;if(a[$3]==2)print v[$3] ORS $0;if(a[$3]>2)print;v[$3]=$0;}' file

Cheers,
RangaSmilie
This User Gave Thanks to rangarasan For This Post:
# 5  
Old 06-12-2012
Dear Ranga

its worked as a charm

Thank you so much
Cheers
A-V
# 6  
Old 06-13-2012
Delete rows with unique value for specific column

Hi Ranga,

Good One but can you please compeltly how this logic works?

Code:
awk 'BEGIN{FS="|";}{a[$3]++;if(a[$3]==2)print v[$3] ORS $0;if(a[$3]>2)print;v[$3]=$0;}' file

Thanks
Krsna
This User Gave Thanks to krsnadasa For This Post:
# 7  
Old 06-13-2012
awk

Quote:
Originally Posted by krsnadasa
Hi Ranga,

Good One but can you please compeltly how this logic works?

Code:
awk 'BEGIN{FS="|";}{a[$3]++;if(a[$3]==2)print v[$3] ORS $0;if(a[$3]>2)print;v[$3]=$0;}' file

Thanks
Krsna
Code:
a[$3]++; - Store the no of repeat counts with name as a key.
v[$3]=$0; - store the previous line
if(a[$3]==2) - If repeat count is more than one(must be 2), then print 
previous line(first occurence) and current line(second occurence).
if(a[$3]>2) - Just print the current line.

Hope i explained clearly.

Cheers,
RangaSmilie
This User Gave Thanks to rangarasan For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

2. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

3. Shell Programming and Scripting

Extract lines with unique value using a specific column

Hi there, I need a help with extracting data from tab delimited file which look like this #CHROM POS ID REF ALT Human Cow Dog Mouse Lizard chr2 3033 . G C 0/0 0/0 0/0 1/1 0/0 chr3 35040 . G T 0/0 0/0 ./. 1/1 0/1 chr4 60584 . T G 1/1 1/1 0/1 1/1 0/0 chr10 7147815 . G A 0/0 1/1 0/0 0/0... (9 Replies)
Discussion started by: houkto
9 Replies

4. Shell Programming and Scripting

Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help. I'm trying to write a script to convert a single column into multiple rows. But it need to recognize the beginning of the string and set it to its specific Column number. Each Line (loop) begins with digit (RANGE). At this moment it's kind of working, but it... (6 Replies)
Discussion started by: AK47
6 Replies

5. Shell Programming and Scripting

Print unique names in a specific column using awk

Is it possible to modify file like this. 1. Remove all the duplicate names in a define column i.e 4th col 2. Count the no.of unique names separated by ";" and print as a 5th col thanx in advance!! Q input c1 30 3 Eh2 c10 96 3 Frp c41 396 3 Ua5;Lop;Kol;Kol c62 2 30 Fmp;Fmp;Fmp ... (5 Replies)
Discussion started by: quincyjones
5 Replies

6. UNIX for Dummies Questions & Answers

Deleting rows where the value in a specific column match

Hi, I have a tab delimited text file where I want to delete all rows that have the same string for column 1. How do I go about doing that? Thanks! Example Input: aa 1 aa 2 aa 3 bb 4 bc 5 bb 6 cd 8 Output: bc 5 cd 8 (4 Replies)
Discussion started by: evelibertine
4 Replies

7. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

8. Shell Programming and Scripting

Delete unique rows - optimize script

Hi all, I have the following input - the unique row key is 1st column cat file.txt A response C request C response D request C request C response E request The desired output should be C request (7 Replies)
Discussion started by: varu0612
7 Replies

9. UNIX for Dummies Questions & Answers

Delete all rows that contain a specific string (text)

Hi, I have a text file and I want to delete all rows that contain a particular string of characters. How do I go about doing that? Thanks! (9 Replies)
Discussion started by: evelibertine
9 Replies

10. Shell Programming and Scripting

Print rows, having pattern in specific column...

Hello all, :) I have a pattern file some what like this, cd003 cd005 cd007 cd008 and input file like this, abc cd001 cd002 zca bca cd002 cd003 cza cba cd003 cd004 zca bac cd004 cd005 zac cba cd005 cd006 acz acb cd006 cd007 caz cab cd007 ... (25 Replies)
Discussion started by: admax
25 Replies
Login or Register to Ask a Question