Awk: count unique element of array


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk: count unique element of array
# 1  
Old 01-19-2017
Awk: count unique element of array

Hi,

tab-separated input:
Code:
blabla_1 A,B,C,C
blabla_2 A,E,G
blabla_3 R,Q,A,B,C,R,Q

output:
Code:
blabla_1 3
blabla_2 3
blabla_3 5

After splitting $2 in an array, I am trying to store the number of unique elements in a variable, but have some difficulties resetting the variable to 0 before processing a new record.

I tried several variants of the following code, but it only works for the first record (all other record take into account the occurrence of the previous line(s)).
Code:
gawk -F '\' '
{
   VAR=0

   a=split($2,b,",")
   
   for(i=1; i<=a; i++){
      if(!c[b[i]]++){
         VAR+=1
      }
   }
   
   print $1 "\t" VAR
}' input.tab

Returns:
Code:
blabla_1 3
blabla_2 2
blabla_3 2

# 2  
Old 01-19-2017
Maybe something more like:
Code:
gawk '
{	VAR=0
	split($2,b,",")
	for(i in b)
		if(!(b[i] in s)) {
			VAR++
			s[b[i]]
		}
	print $1 "\t" VAR
	for(i in s)
		delete s[i]
}' input.tab

I don't have gawk on my system, but it works with a standard awk (/usr/xpg4/bin/awk or nawk on a Solaris system; awk on most other systems). With gawk and some other versions of awk you should be able to replace:
Code:
	for(i in s)
		delete s[i]

with:
Code:
	delete s

but the standards don't yet require this to work in all conforming versions of awk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-20-2017
Slightly different approach:
Code:
awk -F"[ ,]" '
        {for (i=2; i<=NF; i++) {if (!T[$i]++) C++}
         print $1, C
         C = 0
         split (_,T)
        }
' file
blabla_1 3
blabla_2 3
blabla_3 5

# 4  
Old 01-20-2017
Slightly different approach still:
Code:
awk '{split(x,C); n=split($2,F,/,/); for(i in F) if(C[F[i]]++) n--; print $1, n}' file


--
more concise:
Code:
awk '{split(x,C); n=split($2,F,/,/); for(i in F) n-=(C[F[i]]++>0); $2=n}1' file


Last edited by Scrutinizer; 01-20-2017 at 06:37 AM..
# 5  
Old 01-20-2017
@Scrutinizer: VEEERY interesting approach! Brilliant! At least the first one. The second will count wrongly if more than duplicates occur - C[F[i]] will deduct 1 for the first duplicate, 2 for the third occurrence, etc. Might not be what was required?
This User Gave Thanks to RudiC For This Post:
# 6  
Old 01-20-2017
Hi RudiC, you were a bit too fast Smilie Forgot the parentheses and the comparison, which I have corrected in the mean time..
--edit--
actually the parentheses are not needed..
--edit--
putting them back in to avoid ambiguity...

Last edited by Scrutinizer; 01-20-2017 at 06:38 AM..
# 7  
Old 01-23-2017
Thanks guys !
Everything works great !
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies

2. Shell Programming and Scripting

Count number of unique values in each column of array

What is an efficient way of counting the number of unique values in a 400 column by 1000 row array and outputting the counts per column, assuming the unique values in the array are: A, B, C, D In other words the output should look like: Value COL1 COL2 COL3 A 50 51 52... (16 Replies)
Discussion started by: Geneanalyst
16 Replies

3. Shell Programming and Scripting

Awk: check element in array and it's value

Hello, I want to see if element exists in array, if so then, check it's corresponding value. Column 4 is position and column 1 is the chromosome for it. There are duplicates for one position on one chromosome. I want to check if same position exists on different chromosome: Data... (8 Replies)
Discussion started by: genome
8 Replies

4. Shell Programming and Scripting

awk unique count of partial match with semi-colon

Trying to get the unique count of the below input, but if the text in beginning of $5 is a partial match to another line in the file then it is not unique. awk awk '!seen++ {n++} END {print n}' input 7 input chr1 159174749 159174770 chr1:159174749-159174770 ACKR1 chr1 ... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. UNIX for Advanced & Expert Users

Array Element

This question is for someone that's more familiar with Array Element. I need to know if the maximum array element that can be assigned is 1024 and if its so, Is there a workaround solution when the counter exceeded 1024? param_array="$param_nam" counter=$counter+1 #to avoid space... (3 Replies)
Discussion started by: cumeh1624
3 Replies

6. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301... (4 Replies)
Discussion started by: ncwxpanther
4 Replies

7. Shell Programming and Scripting

awk Search Array Element Return Index

Can you search AWK array elements and return each index value for that element. For example an array named car would have index make and element engine. I want to return all makes with engine size 1.6. Array woulld look like this: BMW 1.6 BMW 2.0 BMW 2.5 AUDI 1.8 AUDI 1.6 ... (11 Replies)
Discussion started by: u20sr
11 Replies

8. Shell Programming and Scripting

awk pattern match and count unique in column

Hi all I have a need of searching some pattern in file by month and then count unique records D11 G11 R11 -------> Pattern available in file S11 Jan$1 to $5 column contains some records in which I want to find unique for this purpose I have written script like below awk '/Jan/ ||... (4 Replies)
Discussion started by: nex_asp
4 Replies

9. Shell Programming and Scripting

awk count how many unique IPs have received that error

Hi all, I want to write a awk script that counts unique IPs that have received one special error. For example 25-04-2012;192.168.70.31;1254545454545417;500.0;SUCCESS 25-04-2012;192.168.70.32;355666650914;315126423993;;General_ERROR_23 30-04-2012;192.168.70.33;e;null;null;Failure... (2 Replies)
Discussion started by: arrals_vl
2 Replies

10. Shell Programming and Scripting

acessing awk array element while getline < "file"

I am attempting to write a awk script that reads in a file after awk array elements are assigned and using those elements while reading in the new file. Does this make sense? /pattern/ {tst=$3} (( getline < "file" ) > 0 ) { x=x " "tst } When I print tst in the END statement it... (9 Replies)
Discussion started by: timj123
9 Replies
Login or Register to Ask a Question