How to average the third column for each value of the first column?

Login or Register to Ask a Question and Join Our Community

How to average the third column for each value of the first column?

Tags

average, awk, columns, comma delimited, perl, rows, shell scripts, solved

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to average the third column for each value of the first column?

11-12-2015

Registered User

3, 0

Join Date: Nov 2015

Last Activity: 18 November 2015, 4:06 PM EST

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

How to average the third column for each value of the first column?

So I have a large amount of comma delimited data that looks like this:

Code:

30.498001,-87.881412,0.024958
30.498001,-87.881412,0.035684
30.498001,-87.881412,,0.026
34.758781,-87.650562,0.034292
34.758781,-87.650562,0.029458
32.498567,-86.136587,0.045458
32.498567,-86.136587,0.036292
32.498567,-86.136587,0.024125
32.498567,-86.136587,0.01775

I need to change this data such that the third column is averaged for each different value of the first column. So I would have something like this:

Code:

30.498001,-87.881412, average
34.758781,-87.650562, average
32.498567,-86.136587, average

So for each unique value in the first column the third column is averaged. (only the first column matters because there are no two rows in my data with the same first column but different second columns0

I have tried numerous awk and Perl solutions to similar problems online. I am stumped.

If someone could point me in the right direction that would be incredible.

Last edited by Don Cragun; 11-12-2015 at 05:25 PM.. Reason: Add CODE tags.

PMFCG

View Public Profile for PMFCG

Find all posts by PMFCG

11-12-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Please show us what you have tried. It helps us figure out what problems you are having and the level of your expertise.

Please also tell us what operating system and shell you are using.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-12-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Is the second column value always equal to the first column value?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-12-2015

Registered User

1,709, 666

Join Date: Jan 2013

Last Activity: 20 May 2020, 1:43 PM EDT

Location: Loughborough

Posts: 1,709

Thanks Given: 838

Thanked 666 Times in 467 Posts

I am missing something here.

30.498001,-87.881412,,0.026

There is no third column!

Do we assume that this NULL is the value zero?
Otherwise how do you want us to average a none existant value?

wisecracker

View Public Profile for wisecracker

Find all posts by wisecracker

11-12-2015

Registered User

3, 0

Join Date: Nov 2015

Last Activity: 18 November 2015, 4:06 PM EST

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

The first two column values are lat/long values. They are for specific locations, so considering the first column as an id is fine, I think.

I am using UNIX on a mac (not sure if that's what you needed to know). I am VERY inexperienced, only been working with UNIX for a few months.

So far I have tried switching this code around in the terminal. It is for averaging rows with the same first column.
$ awk '
NR>1{
arr[$1] += $2
count[$1] += 1
}
END{
for (a in arr) {
print "id avg " a " = " arr[a] / count[a]
}
}
' FILE

I have tried to switch the $2 to $3 and switched the order so that it would be averaging columns not rows, but I don't think this is even the right approach. No matter what I try I get "0" as the output for my average...

---------- Post updated at 05:55 PM ---------- Previous update was at 05:49 PM ----------

Quote:

Originally Posted by wisecracker

I am missing something here.

30.498001,-87.881412,0.026

There is no third column!

Do we assume that this NULL is the value zero?
Otherwise how do you want us to average a none existant value?

30.498001,-87.881412,0.026

1,2,3

PMFCG

View Public Profile for PMFCG

Find all posts by PMFCG

11-12-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Assuming that the data is already in sorted order (as in your example), you could try:

Code:

awk '
BEGIN {	FS = OFS = ","
}
function print_av() {
	if(NR <= 1)
		return
	print sum / cnt
	sum = cnt = 0
}
$1 != last {
	print_av()
	printf("%s%s%s%s", $1, OFS, $2, OFS)
	last = $1
}
{	sum += $3
	cnt++
}
END {	print_av()
}' file

But, note as wisecracker said, you need to clean up your input data so that each line has three fields. It would probably be better to use the latitude and the longitude as in:

Code:

awk '
BEGIN {	FS = OFS = ","
}
function print_av() {
	if(NR <= 1)
		return
	print sum / cnt
	sum = cnt = 0
}
$1 FS $2 != last {
	print_av()
	printf("%s%s%s%s", $1, OFS, $2, OFS)
	last = $1 FS $2
}
{	sum += $3
	cnt++
}
END {	print_av()
}' file

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-12-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You weren't too far off ... compare this to what you posted, and you'll find your mistakes:

Code:

awk '
        {arr[$1] += $3
         count[$1] ++
        }
END     {for (a in arr) {print "id avg " a " = " arr[a] / count[a]
                        }
        }
' FS="," file
id avg 32.498567 = 0.0309062
id avg 30.498001 = 0.020214
id avg 34.758781 = 0.031875

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average each numeric column

Hi all, Does anyone know of an efficient unix script to average each numeric column of a multi-column tab delimited file (with header) with some character columns. Here is an example input file: CHR RS_ID ALLELE POP1 POP2 POP3 POP4 POP5 POP6 POP7 POP8...

2. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ...

3. Shell Programming and Scripting

Get column average using ID

I have a file that looks like this: id window BV 1 1 0.5 1 2 0.2 1 3 0.1 2 1 0.5 2 2 0.1 2 3 0.2 3 1 0.4 3 2 0.6 3 3 0.8 Using awk, how would I get the average BV for window 1? Output like this: window avgBV 1 0.47 2 0.23

4. UNIX for Dummies Questions & Answers

Average by specific column value, awk

Hi, I am searching for an awk-script that computes the mean values for the $2 column, but addicted to the values in the $1 column. It also should delete the unnecessary lines after computing... An example (for some reason I cant use the code tag button): cat list.txt 1 10 1 30 1 20...

5. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ...

6. Shell Programming and Scripting

average of rows with same value in the first column

Dear All, I have this file tab delimited A 1 12 22 B 3 34 33 C 55 9 32 A 12 81 71 D 11 1 66 E 455 4 2 B 89 4 3 I would like to make the average every column where the first column is the same, for example, A 6,5 46,5 46,5 B 46,0 19,0 18,0 C 55,0 9,0 32,0 D 11,0 1,0 66,0...

7. Shell Programming and Scripting

average each column in a file

Hi, I tried to do this in excel but there is a limit to how many rows it can handle. All I need to do is average each column in a file and get the final value. My file looks something like this (obviously a lot larger): Joe HHR + 1 2 3 4 5 6 7 8 Jor HHR - 1 2 3 4 5 6 7 8 the output...

8. UNIX for Dummies Questions & Answers

average of a column in a table

Hello, Is there a quick way to compute the average of a column data in a numerical tab delimeted file? Thanks, Gussi

9. UNIX for Dummies Questions & Answers

Use awk to calculate average of column 3

Suppose I have 500 files in a directory and I need to Use awk to calculate average of column 3 for each of the file, how would I do that?

10. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31...

Login or Register to Ask a Question