Remove rows with first 4 fields duplicated in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove rows with first 4 fields duplicated in awk
# 1  
Old 10-28-2011
Remove rows with first 4 fields duplicated in awk

Hi,

I am trying to use awk to remove all rows where the first 4 fields are duplicates. e.g. in the following data lines 6-9 would be removed, leaving one copy of the duplicated row (row 5)

Code:
Borgarhraun    FH9822    ol24    FH9822_ol24_m20    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_r21            ol    Deformed    r
Borgarhraun    FH9822    ol25    FH9822_ol25_m22    ol    Res. B    c
Borgarhraun    FH9822    ol25    FH9822_ol25_r23            ol    Res. B    r
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol35    FH9822_ol35_m24    ol    Res. B    c


so the output would hopefully look like

Code:
Borgarhraun    FH9822    ol24    FH9822_ol24_m20    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_r21            ol    Deformed    r
Borgarhraun    FH9822    ol25    FH9822_ol25_m22    ol    Res. B    c
Borgarhraun    FH9822    ol25    FH9822_ol25_r23            ol    Res. B    r
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol35    FH9822_ol35_m24    ol    Res. B    c

Can anyone help? Thanks

Last edited by radoulov; 10-28-2011 at 06:45 AM.. Reason: Code tags!
# 2  
Old 10-28-2011
Code:
$ nawk '!x[$4]++' infile

This User Gave Thanks to jayan_jay For This Post:
# 3  
Old 10-28-2011
Wow, that was easy! Makes sense as well - Thank you!
# 4  
Old 10-28-2011
Just extending Jayan's code... Requirement was first 4 fields not 4th field... Smilie
Code:
awk '!x[$1$2$3$4]++'  input_file

--ahamed
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ... (1 Reply)
Discussion started by: cmccabe
1 Replies

2. Shell Programming and Scripting

awk to remove range of fields

I am trying to cut a range of fields in awk. The below seems to work for removing field 50, but what is the correct syntax for removing a range ($50-$62). Thank you :). awk awk 'BEGIN{FS=OFS="\t"}{$50=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv Maybe: awk... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Remove rows containing commas with awk

Hello everyone, I have a dataset that looks something like: 1 3 2 2 3 4,5 4 3:9 5 5,9 6 5:6 I need to remove the rows that contain a comma in the second column and I'm not sure how to go about this. Here is an attempt. awk 'BEGIN {FS=" "} { if ($2!==,) print }'Any help is appreciated. (5 Replies)
Discussion started by: Rabu
5 Replies

4. Shell Programming and Scripting

Merge files and remove duplicated rows

In a folder I'll several times daily receive new files that I want to combine into one big file, without any duplicate rows. The file name in the folder will look like e.q: MissingData_2014-08-25_09-30-18.txt MissingData_2014-08-25_09-30-14.txt MissingData_2014-08-26_09-30-12.txt The content... (9 Replies)
Discussion started by: Bergans
9 Replies

5. Shell Programming and Scripting

Removing duplicated first field rows

Hello, I am trying to eliminate rows where the first field is duplicated, leaving the row where the last field is "NET". Data file: 345234|22.34|LST 546543|55.33|LST 793929|98.23|LST 793929|64.69|NET 149593|49.22|LST Desired output: 345234|22.34|LST 546543|55.33|LST... (2 Replies)
Discussion started by: palex
2 Replies

6. Shell Programming and Scripting

Delete duplicated fields in a line

Hi, I have files with this kind of format (separator is space): A1 B1 C1 D1 E1 F1 D1 C1 G1 H1 A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 A3 B3 C3 D3 E3 F3 G3 D3 C3 H3 A4 B4 C4 D4 E4 F4 G4 D4 C4 H4 I want the output to be: A1 B1 E1 F1 G1 H1 A2 B2 E2 F2 G2 H2 A3 B3 E3 F3 G3 H3 A4 B4 E4 F4 G4... (12 Replies)
Discussion started by: Gr4wk
12 Replies

7. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . . (7 Replies)
Discussion started by: the_simpsons
7 Replies

8. Shell Programming and Scripting

awk to grep rows by multiple fields

Hello, I met a challenge to extract part of the table. I'd like to grep the first three matches based on field1 and field2. Input: D A 92.85 1315 83 11 D A 95.90 757 28 3 D A 94.38 480 20 7 D A 91.21 307 21 6 D A 94.26 244 ... (6 Replies)
Discussion started by: yifangt
6 Replies

9. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

10. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ... (4 Replies)
Discussion started by: kiranmosarla
4 Replies
Login or Register to Ask a Question