Delete rows with unique value for specific column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete rows with unique value for specific column
# 8  
Old 06-14-2012
Delete rows with unique value for specific column

Hi Ranga,

Thanks, but still not very clear...I have reformated code for better understanding please explain my comments.
Sorry forgot code tags.

Code:
awk '
BEGIN
{
FS="|";
}
{
a[$3]++;
if(a[$3]==2)
(
print v[$3] ORS $0; # v[$3] --> how it is meaning previous line not current line when we are processing line by line record, ORS= new line is it $0 Currecnt line
)
if(a[$3]>2)
(
print;
)
v[$3]=$0; # is this is assigning current line to v[$3] after every time .
}' file

# 9  
Old 06-14-2012
Quote:
Originally Posted by krsnadasa
Hi Ranga,

Thanks, but still not very clear...I have reformated code for better understanding please explain my comments.
Sorry forgot code tags.

Code:
awk '
BEGIN
{
FS="|";
}
{
a[$3]++;
if(a[$3]==2)
(
print v[$3] ORS $0; # v[$3] --> how it is meaning previous line not current line when we are processing line by line record, ORS= new line is it $0 Currecnt line
)
if(a[$3]>2)
(
print;
)
v[$3]=$0; # is this is assigning current line to v[$3] after every time .
}' file

Hey Hi..

v[$3] is the key here. It stores the current line, before store the current line we are having previous line.

Example1:
1234|1|Jon|some text|some text
1234|2|Jon|some text|some text
6533|2|Kate|some text|some text

Flow of this program as follows,
Line1:
Code:
awk '
BEGIN
{
FS="|";
}
{
a[Jon]++;  - a[Jon]=1
if(a[Jon]==2) --False hence a[Jon] count is 1
(
print v[Jon] ORS $0;
)
if(a[Jon]>2)  - False hence a[Jon] count is 1
(
print;
)
v[Jon]=$0; 
}' file

Line2:
Code:
awk '
BEGIN
{
FS="|";
}
{
a[Jon]++; - a[Jon]=2
if(a[Jon]==2) --True hence a[Jon] count is 2
(
print v[Jon] ORS $0; - Here we can get previous line hence we are not overwrite the array of index 'Jon' So far now.
)
if(a[Jon]>2) - False hence a[Jon] count is 2 but not greater than 2
(
print;
)
v[Jon]=$0;  - Overwriting the the array of index 'Jon' 
}' file

Line3:

Code:
awk '
BEGIN
{
FS="|";
}
{
a[Kate]++; - a[Kate]=1  -- Here Index is differs.
if(a[Kate]==2) --False hence a[Kate] count is 1
(
print v[Kate] ORS $0; 
)
if(a[Kate]>2) - False hence a[Kate] count is 1
(
print;
)
v[Kate]=$0; - Store current line if the same line exist will print that line with this line(as previous line)
}' file

If you have more than 2 occurence, we are just print that line.
Hence we know it has more than 2 occurance.
I hope, You are clear now.

Cheers,
RangaSmilie
This User Gave Thanks to rangarasan For This Post:
# 10  
Old 06-14-2012
Thanks a lot Mate,


This is good one.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

2. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

3. Shell Programming and Scripting

Extract lines with unique value using a specific column

Hi there, I need a help with extracting data from tab delimited file which look like this #CHROM POS ID REF ALT Human Cow Dog Mouse Lizard chr2 3033 . G C 0/0 0/0 0/0 1/1 0/0 chr3 35040 . G T 0/0 0/0 ./. 1/1 0/1 chr4 60584 . T G 1/1 1/1 0/1 1/1 0/0 chr10 7147815 . G A 0/0 1/1 0/0 0/0... (9 Replies)
Discussion started by: houkto
9 Replies

4. Shell Programming and Scripting

Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help. I'm trying to write a script to convert a single column into multiple rows. But it need to recognize the beginning of the string and set it to its specific Column number. Each Line (loop) begins with digit (RANGE). At this moment it's kind of working, but it... (6 Replies)
Discussion started by: AK47
6 Replies

5. Shell Programming and Scripting

Print unique names in a specific column using awk

Is it possible to modify file like this. 1. Remove all the duplicate names in a define column i.e 4th col 2. Count the no.of unique names separated by ";" and print as a 5th col thanx in advance!! Q input c1 30 3 Eh2 c10 96 3 Frp c41 396 3 Ua5;Lop;Kol;Kol c62 2 30 Fmp;Fmp;Fmp ... (5 Replies)
Discussion started by: quincyjones
5 Replies

6. UNIX for Dummies Questions & Answers

Deleting rows where the value in a specific column match

Hi, I have a tab delimited text file where I want to delete all rows that have the same string for column 1. How do I go about doing that? Thanks! Example Input: aa 1 aa 2 aa 3 bb 4 bc 5 bb 6 cd 8 Output: bc 5 cd 8 (4 Replies)
Discussion started by: evelibertine
4 Replies

7. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

8. Shell Programming and Scripting

Delete unique rows - optimize script

Hi all, I have the following input - the unique row key is 1st column cat file.txt A response C request C response D request C request C response E request The desired output should be C request (7 Replies)
Discussion started by: varu0612
7 Replies

9. UNIX for Dummies Questions & Answers

Delete all rows that contain a specific string (text)

Hi, I have a text file and I want to delete all rows that contain a particular string of characters. How do I go about doing that? Thanks! (9 Replies)
Discussion started by: evelibertine
9 Replies

10. Shell Programming and Scripting

Print rows, having pattern in specific column...

Hello all, :) I have a pattern file some what like this, cd003 cd005 cd007 cd008 and input file like this, abc cd001 cd002 zca bca cd002 cd003 cza cba cd003 cd004 zca bac cd004 cd005 zac cba cd005 cd006 acz acb cd006 cd007 caz cab cd007 ... (25 Replies)
Discussion started by: admax
25 Replies
Login or Register to Ask a Question