Count occurrence of column one unique value having unique second column value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count occurrence of column one unique value having unique second column value
# 1  
Old 08-02-2016
Count occurrence of column one unique value having unique second column value

Hello Team,

I need your help on the following:

My input file a.txt is as below:

Code:
3330690|373846|108471
3330690|373846|108471
0640829|459725|100001
0640829|459725|100001
3330690|373847|108471

Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are different. I am trying to get following output:

Code:
2 3330690
1 0640829

Following is what I tried:

Code:
awk -F'|' '{print $1}' a.txt | sort -n | uniq | grep -F -f - a.txt | awk -F'|' '{pritn $2}' | sort | uniq

This gives me following result:

373846
459725
373847

~

But it does not tell me how many times distinct column 1 value is occurring due to distinct value of column 2

Your help is highlu appreciated

Thanks
Angsuman





Moderator's Comments:
Mod Comment Please use code (not HTML) tags as required by forum rules!

Last edited by RudiC; 08-02-2016 at 09:36 AM.. Reason: Changed HTML to CODE tags.
# 2  
Old 08-02-2016
Hello angshuman,

I am not at all sure about your Input_file and expected output. As you are saying column 1 and column 2 should be common then it shouldn't be that count which you have posted.
Code:
awk -F"|" '{A[$1 FS $2]++} END{for(i in A){print A[i] FS i}}'  Input_file

Output will be as follows.
Code:
2|0640829|459725
2|3330690|373846
1|3330690|373847

Above takes field 1st and field 2nd as an index into array. If your requirements are different then please post complete conditions with expected results.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 08-02-2016
Your specification is far from clear. Would this do what you request:
Code:
awk -F"|" '!T[$1,$2]++ {C[$1]++} END {for (c in C) print C[c], c}' file
1 0640829
2 3330690

This User Gave Thanks to RudiC For This Post:
# 4  
Old 08-02-2016
Thank you Ravinder for your response. Sorry if I question is not clear.

Condition 1 - Unique value of column one which are 3330690 and 0640829
Condition 2 - Unique value of column one 3330690 is associated with 2 distinct value of column 2 which are 373846 and 373847. The unique value of column 1 which is 0640829 is associated with unique value of column 2 which is 459725.

Hence output is expected as below

Code:
2 3330690 
1 459725

Hope this clarifies.

---------- Post updated at 08:48 PM ---------- Previous update was at 06:11 PM ----------

Thank you RudiC. This worked perfectly. Now I am trying to understand this piece of code.
Can you please help explaining the code?

Thanks
Angsuman
# 5  
Old 08-02-2016
If the index constructed from $1 and $2 does not exist in the temp array T, its a new combination, and the counter for $1 is incremented. When the input file ends, all these counters and the corresponding $1 values are printed.

More detailed:
For the first occurrence of the $1,$2 combination, T[$1,$2] doesn't exist, so !T[$1,$2] is true, and the counter C[$1] is incremented. Due to the increment of T , the next time the combination is encountered, nothing will happen. C[$1] thus counts up the different $2s for every single $1. In the end, the count for every single $1 is printed.

Last edited by RudiC; 08-02-2016 at 03:41 PM..
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count number of unique values in each column of array

What is an efficient way of counting the number of unique values in a 400 column by 1000 row array and outputting the counts per column, assuming the unique values in the array are: A, B, C, D In other words the output should look like: Value COL1 COL2 COL3 A 50 51 52... (16 Replies)
Discussion started by: Geneanalyst
16 Replies

2. UNIX for Beginners Questions & Answers

Count unique column

Hello, I am trying to count unique rows in my file based on 4 columns (2-5) and to output its frequency in a sixth column. My file is tab delimited My input file looks like this: Colum1 Colum2 Colum3 Colum4 Coulmn5 1.1 100 100 a b 1.1 100 100 a c 1.2 200 205 a d 1.3 300 301 a y 1.3 300... (6 Replies)
Discussion started by: nans
6 Replies

3. UNIX for Beginners Questions & Answers

Find unique values but only in column 1

Hi All, Does anyone have any suggestions/examples of how i could show only lines where the first field is not duplicated. If the first field is listed more than once it shouldnt be shown even if the other columns make it unique. Example file : 876,RIBDA,EC2 876,RIBDH,EX7 877,RIBDF,E28... (4 Replies)
Discussion started by: mutley2202
4 Replies

4. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

5. Shell Programming and Scripting

Sorting unique by column

I am trying to sort, do uniq by 1st column and report this 4 columns tab delimiter table , eg chr10:112174128 rs2255141 2E-10 Cholesterol, total chr10:112174128 rs2255141 7E-16 LDL chr10:17218291 rs10904908 3E-11 HDL Cholesterol chr10:17218291 rs970548 8E-9 TG... (4 Replies)
Discussion started by: fat
4 Replies

6. UNIX for Dummies Questions & Answers

Grep unique 1st column

Hello, I'm trying to used awk but am new to this. I have a file like this: Bob is a good boy Bob is a strange person Bob is a good dancer Jane can party Jane is a good girl Jane is batty I'd like to get this: Bob is a good boy is a strange person is a good dancer Jane... (4 Replies)
Discussion started by: Billyjo
4 Replies

7. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

8. Shell Programming and Scripting

awk pattern match and count unique in column

Hi all I have a need of searching some pattern in file by month and then count unique records D11 G11 R11 -------> Pattern available in file S11 Jan$1 to $5 column contains some records in which I want to find unique for this purpose I have written script like below awk '/Jan/ ||... (4 Replies)
Discussion started by: nex_asp
4 Replies

9. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies
Login or Register to Ask a Question