How to average the third column for each value of the first column?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to average the third column for each value of the first column?
# 1  
Old 11-12-2015
How to average the third column for each value of the first column?

So I have a large amount of comma delimited data that looks like this:

Code:
30.498001,-87.881412,0.024958
30.498001,-87.881412,0.035684
30.498001,-87.881412,,0.026
34.758781,-87.650562,0.034292
34.758781,-87.650562,0.029458
32.498567,-86.136587,0.045458
32.498567,-86.136587,0.036292
32.498567,-86.136587,0.024125
32.498567,-86.136587,0.01775

I need to change this data such that the third column is averaged for each different value of the first column. So I would have something like this:

Code:
30.498001,-87.881412, average
34.758781,-87.650562, average
32.498567,-86.136587, average


So for each unique value in the first column the third column is averaged. (only the first column matters because there are no two rows in my data with the same first column but different second columns0

I have tried numerous awk and Perl solutions to similar problems online. I am stumped. Smilie

If someone could point me in the right direction that would be incredible.

Last edited by Don Cragun; 11-12-2015 at 05:25 PM.. Reason: Add CODE tags.
# 2  
Old 11-12-2015
Please show us what you have tried. It helps us figure out what problems you are having and the level of your expertise.

Please also tell us what operating system and shell you are using.
# 3  
Old 11-12-2015
Is the second column value always equal to the first column value?
# 4  
Old 11-12-2015
I am missing something here.

30.498001,-87.881412,,0.026

There is no third column!

Do we assume that this NULL is the value zero?
Otherwise how do you want us to average a none existant value?
# 5  
Old 11-12-2015
The first two column values are lat/long values. They are for specific locations, so considering the first column as an id is fine, I think.

I am using UNIX on a mac (not sure if that's what you needed to know). I am VERY inexperienced, only been working with UNIX for a few months.

So far I have tried switching this code around in the terminal. It is for averaging rows with the same first column.
$ awk '
NR>1{
arr[$1] += $2
count[$1] += 1
}
END{
for (a in arr) {
print "id avg " a " = " arr[a] / count[a]
}
}
' FILE

I have tried to switch the $2 to $3 and switched the order so that it would be averaging columns not rows, but I don't think this is even the right approach. No matter what I try I get "0" as the output for my average...Smilie

---------- Post updated at 05:55 PM ---------- Previous update was at 05:49 PM ----------

Quote:
Originally Posted by wisecracker
I am missing something here.

30.498001,-87.881412,0.026

There is no third column!

Do we assume that this NULL is the value zero?
Otherwise how do you want us to average a none existant value?
30.498001,-87.881412,0.026

1,2,3
# 6  
Old 11-12-2015
Assuming that the data is already in sorted order (as in your example), you could try:
Code:
awk '
BEGIN {	FS = OFS = ","
}
function print_av() {
	if(NR <= 1)
		return
	print sum / cnt
	sum = cnt = 0
}
$1 != last {
	print_av()
	printf("%s%s%s%s", $1, OFS, $2, OFS)
	last = $1
}
{	sum += $3
	cnt++
}
END {	print_av()
}' file

But, note as wisecracker said, you need to clean up your input data so that each line has three fields. It would probably be better to use the latitude and the longitude as in:
Code:
awk '
BEGIN {	FS = OFS = ","
}
function print_av() {
	if(NR <= 1)
		return
	print sum / cnt
	sum = cnt = 0
}
$1 FS $2 != last {
	print_av()
	printf("%s%s%s%s", $1, OFS, $2, OFS)
	last = $1 FS $2
}
{	sum += $3
	cnt++
}
END {	print_av()
}' file

# 7  
Old 11-12-2015
You weren't too far off ... compare this to what you posted, and you'll find your mistakes:
Code:
awk '
        {arr[$1] += $3
         count[$1] ++
        }
END     {for (a in arr) {print "id avg " a " = " arr[a] / count[a]
                        }
        }
' FS="," file
id avg 32.498567 = 0.0309062
id avg 30.498001 = 0.020214
id avg 34.758781 = 0.031875

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average each numeric column

Hi all, Does anyone know of an efficient unix script to average each numeric column of a multi-column tab delimited file (with header) with some character columns. Here is an example input file: CHR RS_ID ALLELE POP1 POP2 POP3 POP4 POP5 POP6 POP7 POP8... (7 Replies)
Discussion started by: Geneanalyst
7 Replies

2. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

3. Shell Programming and Scripting

Get column average using ID

I have a file that looks like this: id window BV 1 1 0.5 1 2 0.2 1 3 0.1 2 1 0.5 2 2 0.1 2 3 0.2 3 1 0.4 3 2 0.6 3 3 0.8 Using awk, how would I get the average BV for window 1? Output like this: window avgBV 1 0.47 2 0.23 (10 Replies)
Discussion started by: jwbucha
10 Replies

4. UNIX for Dummies Questions & Answers

Average by specific column value, awk

Hi, I am searching for an awk-script that computes the mean values for the $2 column, but addicted to the values in the $1 column. It also should delete the unnecessary lines after computing... An example (for some reason I cant use the code tag button): cat list.txt 1 10 1 30 1 20... (2 Replies)
Discussion started by: bjoern456
2 Replies

5. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ... (2 Replies)
Discussion started by: jackken007
2 Replies

6. Shell Programming and Scripting

average of rows with same value in the first column

Dear All, I have this file tab delimited A 1 12 22 B 3 34 33 C 55 9 32 A 12 81 71 D 11 1 66 E 455 4 2 B 89 4 3 I would like to make the average every column where the first column is the same, for example, A 6,5 46,5 46,5 B 46,0 19,0 18,0 C 55,0 9,0 32,0 D 11,0 1,0 66,0... (8 Replies)
Discussion started by: paolo.kunder
8 Replies

7. Shell Programming and Scripting

average each column in a file

Hi, I tried to do this in excel but there is a limit to how many rows it can handle. All I need to do is average each column in a file and get the final value. My file looks something like this (obviously a lot larger): Joe HHR + 1 2 3 4 5 6 7 8 Jor HHR - 1 2 3 4 5 6 7 8 the output... (1 Reply)
Discussion started by: kylle345
1 Replies

8. UNIX for Dummies Questions & Answers

average of a column in a table

Hello, Is there a quick way to compute the average of a column data in a numerical tab delimeted file? Thanks, Gussi (2 Replies)
Discussion started by: Gussifinknottle
2 Replies

9. UNIX for Dummies Questions & Answers

Use awk to calculate average of column 3

Suppose I have 500 files in a directory and I need to Use awk to calculate average of column 3 for each of the file, how would I do that? (6 Replies)
Discussion started by: grossgermany
6 Replies

10. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies
Login or Register to Ask a Question