Remove rows with first 4 fields duplicated in awk Post: 302568727

Sponsored Content

Top Forums Shell Programming and Scripting Remove rows with first 4 fields duplicated in awk Post 302568727 by tomahawk on Friday 28th of October 2011 05:22:20 AM

10-28-2011

Registered User

Remove rows with first 4 fields duplicated in awk

Hi,

I am trying to use awk to remove all rows where the first 4 fields are duplicates. e.g. in the following data lines 6-9 would be removed, leaving one copy of the duplicated row (row 5)

Code:

Borgarhraun    FH9822    ol24    FH9822_ol24_m20    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_r21            ol    Deformed    r
Borgarhraun    FH9822    ol25    FH9822_ol25_m22    ol    Res. B    c
Borgarhraun    FH9822    ol25    FH9822_ol25_r23            ol    Res. B    r
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol35    FH9822_ol35_m24    ol    Res. B    c

so the output would hopefully look like

Code:

Borgarhraun    FH9822    ol24    FH9822_ol24_m20    ol    Deformed    c
Borgarhraun    FH9822    ol24    FH9822_ol24_r21            ol    Deformed    r
Borgarhraun    FH9822    ol25    FH9822_ol25_m22    ol    Res. B    c
Borgarhraun    FH9822    ol25    FH9822_ol25_r23            ol    Res. B    r
Borgarhraun    FH9822    ol24    FH9822_ol24_profCD    ol    Deformed    c
Borgarhraun    FH9822    ol35    FH9822_ol35_m24    ol    Res. B    c

Can anyone help? Thanks

Last edited by radoulov; 10-28-2011 at 06:45 AM.. Reason: Code tags!

tomahawk

View Public Profile for tomahawk

Find all posts by tomahawk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ...

2. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486...

3. Shell Programming and Scripting

awk to grep rows by multiple fields

Hello, I met a challenge to extract part of the table. I'd like to grep the first three matches based on field1 and field2. Input: D A 92.85 1315 83 11 D A 95.90 757 28 3 D A 94.38 480 20 7 D A 91.21 307 21 6 D A 94.26 244 ...

4. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . .

5. Shell Programming and Scripting

Delete duplicated fields in a line

Hi, I have files with this kind of format (separator is space): A1 B1 C1 D1 E1 F1 D1 C1 G1 H1 A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 A3 B3 C3 D3 E3 F3 G3 D3 C3 H3 A4 B4 C4 D4 E4 F4 G4 D4 C4 H4 I want the output to be: A1 B1 E1 F1 G1 H1 A2 B2 E2 F2 G2 H2 A3 B3 E3 F3 G3 H3 A4 B4 E4 F4 G4...

6. Shell Programming and Scripting

Removing duplicated first field rows

Hello, I am trying to eliminate rows where the first field is duplicated, leaving the row where the last field is "NET". Data file: 345234|22.34|LST 546543|55.33|LST 793929|98.23|LST 793929|64.69|NET 149593|49.22|LST Desired output: 345234|22.34|LST 546543|55.33|LST...

7. Shell Programming and Scripting

Merge files and remove duplicated rows

In a folder I'll several times daily receive new files that I want to combine into one big file, without any duplicate rows. The file name in the folder will look like e.q: MissingData_2014-08-25_09-30-18.txt MissingData_2014-08-25_09-30-14.txt MissingData_2014-08-26_09-30-12.txt The content...

8. Shell Programming and Scripting

Remove rows containing commas with awk

Hello everyone, I have a dataset that looks something like: 1 3 2 2 3 4,5 4 3:9 5 5,9 6 5:6 I need to remove the rows that contain a comma in the second column and I'm not sure how to go about this. Here is an attempt. awk 'BEGIN {FS=" "} { if ($2!==,) print }'Any help is appreciated.

9. Shell Programming and Scripting

awk to remove range of fields

I am trying to cut a range of fields in awk. The below seems to work for removing field 50, but what is the correct syntax for removing a range ($50-$62). Thank you :). awk awk 'BEGIN{FS=OFS="\t"}{$50=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv Maybe: awk...

10. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ...

LEARN ABOUT REDHAT

igawk

IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME

       igawk - gawk with include files

SYNOPSIS

       igawk [ all gawk options ] -f program-file [ -- ] file ...
       igawk [ all gawk options ] [ -- ] program-text file ...

DESCRIPTION

       Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1).

       AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like

	      @include getopt.awk

       in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path.

OPTIONS

       See gawk(1) for a full description of the AWK language and the options that gawk supports.

EXAMPLES

       cat << EOF > test.awk
       @include getopt.awk

       BEGIN {
	    while (getopt(ARGC, ARGV, "am:q") != -1)
		 ...
       }
       EOF

       igawk -f test.awk

SEE ALSO

       gawk(1)

       Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995.

AUTHOR

       Arnold Robbins (arnold@skeeve.com).

Free Software Foundation					    Nov 3 1999								  IGAWK(1)