Count duplicate lines ignoring certain columns

02-06-2014

Registered User

22, 0

Join Date: Aug 2009

Last Activity: 11 March 2020, 1:59 AM EDT

Posts: 22

Thanks Given: 9

Thanked 0 Times in 0 Posts

Count duplicate lines ignoring certain columns

I have this structure:

Code:

col1 col2 col3 col4 col5
27	xxx	38	aaa	ttt	
2	xxx	38	aaa	yyy
1	xxx	38	aaa	yyy

I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this:

Code:

col1 col2 col3 col4 col5
27	xxx	38	aaa	ttt	
3	xxx	38	aaa	yyy

I'm using uniq -c -f1 but it doesn't help with the addition of the numbers in the first column:

Code:

1 	 col1 col2 col3 col4 col5
1 	 27	xxx	38	aaa	ttt	
2 	 3	xxx	38	aaa	yyy

Could you please help me with this?

Thank you!

coppuca

View Public Profile for coppuca

Find all posts by coppuca

02-06-2014

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Using awk; reading input file twice:

Code:

awk '
        NR == 1 {
                print
        }
        NR == FNR && NR > 1 {
                i = $2 FS $3 FS $4 FS $5
                A[i] += $1
                next
        }
        {
                i = $2 FS $3 FS $4 FS $5
                if ( ( i in A ) && !( i in R ) )
                        print A[i], i
                R[i]
        }
' file file

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

02-06-2014

Registered User

22, 0

Join Date: Aug 2009

Last Activity: 11 March 2020, 1:59 AM EDT

Posts: 22

Thanks Given: 9

Thanked 0 Times in 0 Posts

It works! Thank you so much, Yoda! Could you please briefly explain what does it do? I'm trying to learn awk.

coppuca

View Public Profile for coppuca

Find all posts by coppuca

02-06-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Or:

Code:

awk 'NR==1{print; next} {v=$1; sub(v,x); C[$0]+=v} END{for(i in C) print C[i] i}' file

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Count duplicate lines ignoring certain columns

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines after ignoring case and spaces between

Discussion started by: kraljic

2. Shell Programming and Scripting

Replace a column with a value by ignoring the header lines

Discussion started by: aravindj80

3. Shell Programming and Scripting

Ignoring lines and create new file

Discussion started by: callmatkarna

4. UNIX for Dummies Questions & Answers

remove duplicate lines based on two columns and judging from a third one

Discussion started by: TheTransporter

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

6. Shell Programming and Scripting

ignoring lines in a file

Discussion started by: thelakbe

7. UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

Discussion started by: japaneseguitars

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Discussion started by: dr_sabz

9. Shell Programming and Scripting

Ignoring several lines at once in cshell

Discussion started by: sarbjit

10. UNIX for Dummies Questions & Answers

How to count lines - ignoring blank lines and commented lines

Discussion started by: kthatch