Count duplicate lines ignoring certain columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count duplicate lines ignoring certain columns
# 1  
Old 02-06-2014
Count duplicate lines ignoring certain columns

I have this structure:

Code:
col1 col2 col3 col4 col5
27	xxx	38	aaa	ttt	
2	xxx	38	aaa	yyy
1	xxx	38	aaa	yyy

I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this:

Code:
col1 col2 col3 col4 col5
27	xxx	38	aaa	ttt	
3	xxx	38	aaa	yyy

I'm using uniq -c -f1 but it doesn't help with the addition of the numbers in the first column:

Code:
1 	 col1 col2 col3 col4 col5
1 	 27	xxx	38	aaa	ttt	
2 	 3	xxx	38	aaa	yyy

Could you please help me with this?

Thank you!
# 2  
Old 02-06-2014
Using awk; reading input file twice:
Code:
awk '
        NR == 1 {
                print
        }
        NR == FNR && NR > 1 {
                i = $2 FS $3 FS $4 FS $5
                A[i] += $1
                next
        }
        {
                i = $2 FS $3 FS $4 FS $5
                if ( ( i in A ) && !( i in R ) )
                        print A[i], i
                R[i]
        }
' file file

This User Gave Thanks to Yoda For This Post:
# 3  
Old 02-06-2014
It works! Thank you so much, Yoda! Could you please briefly explain what does it do? I'm trying to learn awk.
# 4  
Old 02-06-2014
Or:
Code:
awk 'NR==1{print; next} {v=$1; sub(v,x); C[$0]+=v} END{for(i in C) print C[i] i}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines after ignoring case and spaces between

Oracle Linux 6.5 $ cat someStrings.txt GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL; GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL; GRANT select on SCOTT.emp to JOHN; grant select on scott.emp to john; grant select on scott.dept to hr;If you ignore the case and the empty space between the... (6 Replies)
Discussion started by: kraljic
6 Replies

2. Shell Programming and Scripting

Replace a column with a value by ignoring the header lines

i have a file in the gz format , the content of the file is as follow. gzcat f1.gz # 1.name # 2.location # 3.age # 4.dob . . . . . . . . . # 43.hobbies < Aravind,33,chennai,09091980, , , , , , , surfing> (5 Replies)
Discussion started by: aravindj80
5 Replies

3. Shell Programming and Scripting

Ignoring lines and create new file

Hello, I have a requirement to ignore few lines in a file before keyword FILEHEADER . As soon as there is keyword FILEHEADER is identified in file , it will form another file with data from FILEHEADER to whatever in file after FILEHEADER. I wrote filename=$1 awk... (4 Replies)
Discussion started by: callmatkarna
4 Replies

4. UNIX for Dummies Questions & Answers

remove duplicate lines based on two columns and judging from a third one

hello all, I have an input file with four columns like this with a lot of lines and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5.... (5 Replies)
Discussion started by: TheTransporter
5 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. Shell Programming and Scripting

ignoring lines in a file

HI, command to cat a readable file by ignoring the first line and last line or command to cat a readable file by ignoring the lines with delimiter Please advise on this. (2 Replies)
Discussion started by: thelakbe
2 Replies

7. UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example 10 chr1 ASF 30 15 chr1 ASF 20 5 chr1 ASF 30 6 chr2 EBC 15 4 chr2 EBC 30 ... I want to know if I can delete duplicate lines while ignoring column 1, so the... (5 Replies)
Discussion started by: japaneseguitars
5 Replies

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies

9. Shell Programming and Scripting

Ignoring several lines at once in cshell

Hi We use # sign to ignore any line (i.e. comment ). But is it possible to ignore group of line at once or i have to use # in front of each line. Thanks Sarbjit (3 Replies)
Discussion started by: sarbjit
3 Replies

10. UNIX for Dummies Questions & Answers

How to count lines - ignoring blank lines and commented lines

What is the command to count lines in a files, but ignore blank lines and commented lines? I have a file with 4 sections in it, and I want each section to be counted, not including the blank lines and comments... and then totalled at the end. Here is an example of what I would like my... (6 Replies)
Discussion started by: kthatch
6 Replies
Login or Register to Ask a Question