Count unique column

05-16-2017

Registered User

91, 0

Join Date: Mar 2013

Last Activity: 30 March 2020, 3:20 AM EDT

Posts: 91

Thanks Given: 45

Thanked 0 Times in 0 Posts

Count unique column

Hello,
I am trying to count unique rows in my file based on 4 columns (2-5) and to output its frequency in a sixth column. My file is tab delimited
My input file looks like this:

Code:

Colum1 Colum2 Colum3 Colum4 Coulmn5
1.1 100 100 a b
1.1 100 100 a c
1.2 200 205 a d
1.3 300 301 a y
1.3 300 301 a y
1.4 400 410 a b
1.5 500 510 a c
1.5 500 500 a d
1.5 500 500 a y
1.5 500 500 a y

and the desired output is

Code:

Colum1 Colum2 Colum3 Colum4 Column5 Column6
1.1 100 100 a b 1
1.1 100 100 a c 1
1.2 200 205 a d 1
1.3 300 301 a y 2
1.4 400 410 a b 1
1.5 500 510 a c 1
1.5 500 500 a d 1
1.5 500 500 a y 2

So far I have tried this

Code:

sort inputfile.csv | uniq -ci | awk '{print $0}' > freq.txt

This gives a frequency of 1 for all the rows and ends up sorting the output file. I want the output to be in its original form. Any suggestions ? Thank you.

Last edited by Don Cragun; 05-17-2017 at 06:11 AM.. Reason: Change ICODE tags to CODE tags.

nans

View Public Profile for nans

Find all posts by nans

05-16-2017

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello nans,

Could you please try following and let me know if this helps you.

Code:

awk 'FNR>1 && FNR==NR{A[$2,$3,$4,$5]++;next} (($2,$3,$4,$5) in A){print $0,A[$2,$3,$4,$5];delete A[$2,$3,$4,$5];next}'   Input_file  Input_file

Also if you need to have the headers then you could mention them in the BEGIN section of it too.

Thanks,
R. Singh

These 2 Users Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

05-16-2017

Registered User

91, 0

Join Date: Mar 2013

Last Activity: 30 March 2020, 3:20 AM EDT

Posts: 91

Thanks Given: 45

Thanked 0 Times in 0 Posts

Thanks RavinderSingh.
I tried the suggested command

Code:

awk 'FNR>1 && FNR==NR{A[$2,$3,$4,$5]++;next} (($2,$3,$4,$5) in A){print $0,A[$2,$3,$4,$5];delete A[$2,$3,$4,$5];next}' Input_file > Ouput_file

This ends up in generating a blank output file.

nans

View Public Profile for nans

Find all posts by nans

05-16-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Yes of course. You didn't copy RavinderSingh13's entire proposal.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

05-16-2017

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by nans

Thanks RavinderSingh.
I tried the suggested command

Code:

awk 'FNR>1 && FNR==NR{A[$2,$3,$4,$5]++;next} (($2,$3,$4,$5) in A){print $0,A[$2,$3,$4,$5];delete A[$2,$3,$4,$5];next}' Input_file > Ouput_file

This ends up in generating a blank output file.

Hello nans,

You should mention Input_file 2 times in my above code as Rudi mentioned and it should fly then.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

05-17-2017

Registered User

91, 0

Join Date: Mar 2013

Last Activity: 30 March 2020, 3:20 AM EDT

Posts: 91

Thanks Given: 45

Thanked 0 Times in 0 Posts

Ah yes, thank you. Though the output looks

Code:

Colum1 Colum2 Colum3 Colum4 Column5 Column6
1.1 100 100 a b^M 1
1.1 100 100 a c^M 1
1.2 200 205 a d^M 1
1.3 300 301 a y^M 2
1.4 400 410 a b^M 1
1.5 500 510 a c^M  1
1.5 500 500 a d^M  1
1.5 500 500 a y^M  2

But that should be okay, I can always use sed to remove the ^M characters. Thank you.

Last edited by Don Cragun; 05-17-2017 at 06:10 AM.. Reason: Change QUOTE tags to CODE tags.

nans

View Public Profile for nans

Find all posts by nans

05-17-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by nans

Ah yes, thank you. Though the output looks

Code:

Colum1 Colum2 Colum3 Colum4 Column5 Column6
1.1 100 100 a b^M 1
1.1 100 100 a c^M 1
1.2 200 205 a d^M 1
1.3 300 301 a y^M 2
1.4 400 410 a b^M 1
1.5 500 510 a c^M  1
1.5 500 500 a d^M  1
1.5 500 500 a y^M  2

But that should be okay, I can always use sed to remove the ^M characters. Thank you.

I don't see how this code prints out the heading line, but you can get rid of the carriage return characters in the awk script without needing to also invoke sed:

Code:

awk '{gsub(/\r/,"")}FNR>1 && FNR==NR{A[$2,$3,$4,$5]++;next} (($2,$3,$4,$5) in A){print $0,A[$2,$3,$4,$5];delete A[$2,$3,$4,$5];next}'   Input_file  Input_file

If you want the augmented header line, you might try the following (in a formatI find it a little bit easier to read):

Code:

awk '
{	gsub(/\r/, "")
}
NR==1 {	print $0, "Column6"
	next
}
FNR>1 && FNR==NR {
	A[$2, $3, $4, $5]++
	next
}
(($2, $3, $4, $5) in A) {
	print $0, A[$2, $3, $4, $5]
	delete A[$2, $3, $4, $5]
}'   OFS='\t' Input_file  Input_file

Note that the sample input and output you provided used <space> as a field delimiter but you said your files were <tab> delimited. I specified <tab> as the output field separator here assuming that your real data is <tab> delimited.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

UNIX for Beginners Questions & Answers

Count unique column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count number of unique values in each column of array

Discussion started by: Geneanalyst

2. UNIX for Beginners Questions & Answers

Count unique words

Discussion started by: imranrasheedamu

3. Shell Programming and Scripting

Print count of unique values

Discussion started by: H squared

4. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Discussion started by: angshuman

5. Shell Programming and Scripting

Count of unique lines in field 4

Discussion started by: cmccabe

6. Shell Programming and Scripting

awk to count using each unique value

Discussion started by: ncwxpanther

7. Shell Programming and Scripting

Count frequency of unique values in specific column

Discussion started by: owwow14

8. Shell Programming and Scripting

awk pattern match and count unique in column

Discussion started by: nex_asp

9. Shell Programming and Scripting

Unique count from flat file

Discussion started by: Pratik4891

10. Shell Programming and Scripting

How to count unique strings

Discussion started by: my_Perl