Count occurrence of column one unique value having unique second column value

08-02-2016

Registered User

76, 0

Join Date: Jul 2007

Last Activity: 2 April 2018, 4:19 PM EDT

Location: India

Posts: 76

Thanks Given: 24

Thanked 0 Times in 0 Posts

Count occurrence of column one unique value having unique second column value

Hello Team,

I need your help on the following:

My input file a.txt is as below:

Code:

3330690|373846|108471
3330690|373846|108471
0640829|459725|100001
0640829|459725|100001
3330690|373847|108471

Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are different. I am trying to get following output:

Code:

2 3330690
1 0640829

Following is what I tried:

Code:

awk -F'|' '{print $1}' a.txt | sort -n | uniq | grep -F -f - a.txt | awk -F'|' '{pritn $2}' | sort | uniq

This gives me following result:

373846
459725
373847

~

But it does not tell me how many times distinct column 1 value is occurring due to distinct value of column 2

Your help is highlu appreciated

Thanks
Angsuman

Moderator's Comments:

Please use code (not HTML) tags as required by forum rules!

Last edited by RudiC; 08-02-2016 at 09:36 AM.. Reason: Changed HTML to CODE tags.

angshuman

View Public Profile for angshuman

Find all posts by angshuman

08-02-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello angshuman,

I am not at all sure about your Input_file and expected output. As you are saying column 1 and column 2 should be common then it shouldn't be that count which you have posted.

Code:

awk -F"|" '{A[$1 FS $2]++} END{for(i in A){print A[i] FS i}}'  Input_file

Output will be as follows.

Code:

2|0640829|459725
2|3330690|373846
1|3330690|373847

Above takes field 1st and field 2nd as an index into array. If your requirements are different then please post complete conditions with expected results.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

08-02-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Your specification is far from clear. Would this do what you request:

Code:

awk -F"|" '!T[$1,$2]++ {C[$1]++} END {for (c in C) print C[c], c}' file
1 0640829
2 3330690

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-02-2016

Registered User

76, 0

Join Date: Jul 2007

Last Activity: 2 April 2018, 4:19 PM EDT

Location: India

Posts: 76

Thanks Given: 24

Thanked 0 Times in 0 Posts

Thank you Ravinder for your response. Sorry if I question is not clear.

Condition 1 - Unique value of column one which are 3330690 and 0640829
Condition 2 - Unique value of column one 3330690 is associated with 2 distinct value of column 2 which are 373846 and 373847. The unique value of column 1 which is 0640829 is associated with unique value of column 2 which is 459725.

Hence output is expected as below

Code:

2 3330690 
1 459725

Hope this clarifies.

---------- Post updated at 08:48 PM ---------- Previous update was at 06:11 PM ----------

Thank you RudiC. This worked perfectly. Now I am trying to understand this piece of code.
Can you please help explaining the code?

Thanks
Angsuman

angshuman

View Public Profile for angshuman

Find all posts by angshuman

08-02-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

If the index constructed from $1 and $2 does not exist in the temp array T, its a new combination, and the counter for $1 is incremented. When the input file ends, all these counters and the corresponding $1 values are printed.

More detailed:
For the first occurrence of the $1,$2 combination, T[$1,$2] doesn't exist, so !T[$1,$2] is true, and the counter C[$1] is incremented. Due to the increment of T , the next time the combination is encountered, nothing will happen. C[$1] thus counts up the different $2s for every single $1. In the end, the count for every single $1 is printed.

Last edited by RudiC; 08-02-2016 at 03:41 PM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count number of unique values in each column of array

Discussion started by: Geneanalyst

2. UNIX for Beginners Questions & Answers

Count unique column

Discussion started by: nans

3. UNIX for Beginners Questions & Answers

Find unique values but only in column 1

Discussion started by: mutley2202

4. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Discussion started by: msatseqs

5. Shell Programming and Scripting

Sorting unique by column

Discussion started by: fat

6. UNIX for Dummies Questions & Answers

Grep unique 1st column

Discussion started by: Billyjo

7. Shell Programming and Scripting

Count frequency of unique values in specific column

Discussion started by: owwow14

8. Shell Programming and Scripting

awk pattern match and count unique in column

Discussion started by: nex_asp

9. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Discussion started by: amigarus