Find duplicates in column 1 and merge their lines (awk?)

01-28-2013

Registered User

3, 0

Join Date: Jan 2013

Last Activity: 23 February 2013, 3:50 PM EST

Posts: 3

Thanks Given: 3

Thanked 0 Times in 0 Posts

Find duplicates in column 1 and merge their lines (awk?)

Hi,

I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines.

My input file:

Code:

comp100002	aaa	bbb	ccc	ddd	eee	fff	ggg
comp100003	aba	aba	aba	aba	aba	aba	aba
comp100003	fff	fff	fff	fff	fff	fff	fff
comp100004	xxx	xyz	xyz	xxx	xyz	xxx	xyz

My desired output file:

Code:

comp100002	aaa	bbb	ccc	ddd	eee	fff	ggg
comp100003	aba	aba	aba	aba	aba	aba	aba	fff	fff	fff	fff	fff	fff	fff
comp100004	xxx	xyz	xyz	xxx	xyz	xxx	xyz

Thanks for advice.

falcox

View Public Profile for falcox

Find all posts by falcox

01-28-2013

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

try:

Code:

awk '
!(a[$1]) {a[$1]=$0}
a[$1] {w=$1; $1=""; a[w]=a[w] $0}
END {for (i in a) print a[i]}
' FS="\t" OFS="\t" infile

This User Gave Thanks to rdrtx1 For This Post:

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

01-28-2013

Registered User

3, 0

Join Date: Jan 2013

Last Activity: 23 February 2013, 3:50 PM EST

Posts: 3

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thanks a lot, it prints desired results. However, if there is a single-copy identifier in field 1, it appends whole line twice. It's easy to get rid of these 8 additional columns, but since I am learning, could you please comment which part of the code is responsible for this?

falcox

View Public Profile for falcox

Find all posts by falcox

01-28-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

try:

Code:

awk 'p!=$1{if(p)print s; p=s=$1} {sub(p,x); s=s $0} END{if(p)print s}' FS='\t' file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

01-28-2013

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

Fixed, try:

Code:

awk '
!(a[$1]) {a[$1]=$0; next}
a[$1] {w=$1; $1=""; a[w]=a[w] $0}
END {for (i in a) print a[i]}
' FS="\t" OFS="\t" infile

These 2 Users Gave Thanks to rdrtx1 For This Post:

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

01-28-2013

Registered User

3, 0

Join Date: Jan 2013

Last Activity: 23 February 2013, 3:50 PM EST

Posts: 3

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thanks guys. Checked by diff and results of both scripts are now identical.

falcox

View Public Profile for falcox

Find all posts by falcox

Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

2. Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

Discussion started by: busyboy

3. Shell Programming and Scripting

Find matched patterns in a column of 2 files with different size and merge them

Discussion started by: redse171

4. Shell Programming and Scripting

Find lines with matching column 1 value, retain only the one with highest value in column 2

Discussion started by: pathunkathunk

5. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

Discussion started by: pamu

6. Shell Programming and Scripting

Gawk / Awk Merge Lines based on Key

Discussion started by: Jamesfirst

7. Shell Programming and Scripting

Find duplicates in the first column of text file

Discussion started by: gameboy87

8. Shell Programming and Scripting

Merge lines in a file with Awk - incorrect output

Discussion started by: mv652

9. Shell Programming and Scripting

Awk to find duplicates in 2nd field

Discussion started by: pinnacle

10. Shell Programming and Scripting

duplicates lines with one column different

Discussion started by: dhanamurthy