Remove duplicate values with condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate values with condition
# 1  
Old 06-20-2014
Remove duplicate values with condition

Hi Gents,

Please can you help me to get the desired output .

In the first column I have some duplicate records, The condition is that all need to reject the duplicate record keeping the last occurrence. But the condition is. If the last occurrence is equal to value 14 or 98 in column 3 and >25 or < 200 in column 4. I should keep the first occurrence and reject the last one.

Some times, the record has one single entry with value 14 or 98 in column 3 or value >25 or < 200 in column 4. Of the entry is only one time. I need to keep the entry and not reject.

Here is my Input file
Input file.
Code:
2265520807        1        1       13     1186
2265520807        2        1       14     1186
2265520809        1        1        9     1186
2265520809        2        1       10     1186
2265520811        1        1        9     1186
2265520833        1        1        2     1186
2265520833        2       14        2     1186
2265520835        1        1        2     1186
2265520837        1       14        4     1186
2265520837        2        1        4     1186
2265520841        1        1        2     1186
2265520849        1        1        1     1186
2265520849        2       14    85423     1186
2266320807        2        1        8     1186
2266320809        1        1        1     1186
2266320809        2        1       57     1186
2266320825        0        0        0        0
2266320825        2        1        2     1186
2266320833        1        1        1     1186
2266320841        1        1        3     1186
2266320849        1       14    85223     1186
2266520729        1        1       10     1187
2266520805        1        1        1     1187
2266520805        2        1        3     1187
2267120963        1       98        7     1187
2267120967        1        1       15     1187
2267120969        1       98    85147     1187
2267120969        2        1        1     1187
2267120969        3       98    85147     1187

using this code I get the first duplicate entry.
Code:
awk 'X[$1] {print X[$1]}{ X[$1]=$0}' Input.txt

Code:
2265520807        1        1       13     1186
2265520809        1        1        9     1186
2265520833        1        1        2     1186
2265520837        1       14        4     1186
2265520849        1        1        1     1186
2266320809        1        1        1     1186
2266320825        0        0        0        0
2266520805        1        1        1     1187
2267120969        1       98    85147     1187
2267120969        2        1        1     1187

But As I explain at the beggining I would like to get something like this.

Code:
2265520807        1        1       13     1186
2265520809        1        1        9     1186
2265520833        2       14        2     1186
2265520837        1       14        4     1186
2265520849        2       14    85423     1186
2266320809        2        1       57     1186
2266320825        0        0        0        0
2266520805        1        1        1     1187
2267120969        1       98    85147     1187
2267120969        3       98    85147     1187

Thanks for your support Smilie
# 2  
Old 06-23-2014
If I read
Quote:
and >25 or < 200 in column 4
as or column4 is >25 and <200, I can achieve your desired output with
Code:
awk '($1 in X) {if ($3==14 || $3==98 || ($4>25 && $4<200)) {print} else {print X[$1]}} {X[$1]=$0}' Input.txt

NB a lookup with ($1 in X) is little more efficient than X[$1].
This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 06-23-2014
Dear MadeInGermany
Thanks for your support
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

DB2 Query modification to remove duplicate values using LISTAGG function

I am using DB2 v9 and trying to get country values in comma seperated format using below query SELECT distinct LISTAGG(COUNTRIES, ',') WITHIN GROUP(ORDER BY EMPLOYEE) FROM LOCATION ; Output Achieved MEXICO,UNITED STATES,INDIA,JAPAN,UNITED KINGDOM,MEXICO,UNITED STATES The table... (4 Replies)
Discussion started by: Perlbaby
4 Replies

2. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

3. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

4. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

5. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

6. Shell Programming and Scripting

Duplicate values merge

Dear Gents, Please can you help me to solve this problem. Input file... 22057485 ,219 ,1050 22057485 ,223 ,1050 21897425 ,278 ,1050 21897425 ,279 ,1050 21897425 ,287 ,1050 20497465 ,602 ,1051 20517500 ,677 ,1051 20517500 ,681 ,1051 20577555 ,775 ,1052 20577555 ,778... (7 Replies)
Discussion started by: jiam912
7 Replies

7. Shell Programming and Scripting

duplicate values

Hi, How to enumerate duplicate values, without sorting the file. example 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 Where the first column have the repetead values without sorting, I would like to get the value of the times that the value is repetead , as I show... (2 Replies)
Discussion started by: jiam912
2 Replies

8. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp"). I tried sort -t -nuk2,3 file.txtBut got: sort: multi-character tab `-nuk2,3' "man sort" did not help me out Any pointers? Input: Output: (5 Replies)
Discussion started by: pathunkathunk
5 Replies

9. Shell Programming and Scripting

remove duplicate lines with condition

hi to all Does anyone know if there's a way to remove duplicate lines which we consider the same only if they have the first and the second column the same? For example I have : us2333 bbb 5 us2333 bbb 3 us2333 bbb 2 and I want to get us2333 bbb 10 The thing is I cannot... (2 Replies)
Discussion started by: vlm
2 Replies

10. Shell Programming and Scripting

Remove duplicate line on condition

Hi Ive been scratching over this for some time with no solution. I have a file like this 1 bla bla 1 2 bla bla 2 4 bla bla 3 5 bla bla 1 6 bla bla 1 I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different. Any ideasss?? (23 Replies)
Discussion started by: jamie_123
23 Replies
Login or Register to Ask a Question