Find the average based on similar names in the first column

12-04-2012

Registered User

53, 0

Join Date: Jan 2011

Last Activity: 4 May 2015, 5:02 AM EDT

Posts: 53

Thanks Given: 6

Thanked 0 Times in 0 Posts

Find the average based on similar names in the first column

I have a table, say this:

Code:

name1  num1 num2 num3 num4
name2  num5 num6 num7 num8
name3  num1 num3 num4 num9
name2  num8 num9 num1 num2
name2  num4 num5 num6 num4
name4  num4 num5 num7 num8
name5  num1 num3 num9 num7
name5  num6 num8 num3 num4

I want a code that will sort my data according to the first column and for ALL columns with the same name, calculate the average of each of the corresponding columns. In this case it would be:

Code:

name1  num1 num2 num3 num4
name2  avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4)
name2  avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4)
name2  avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4)
name3  num1 num3 num4 num9
name4  num4 num5 num7 num8
name5  avg(1,6) avg(3,8) avg(9,3) avg(7,4)
name5  avg(1,6) avg(3,8) avg(9,3) avg(7,4)

FelipeAd

View Public Profile for FelipeAd

Find all posts by FelipeAd

12-04-2012

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

What have you tried? Where are you stuck?

Scott

View Public Profile for Scott

Find all posts by Scott

12-04-2012

Registered User

53, 0

Join Date: Jan 2011

Last Activity: 4 May 2015, 5:02 AM EDT

Posts: 53

Thanks Given: 6

Thanked 0 Times in 0 Posts

I started by using sort command, in particular

Code:

sort -d -k1,1 file

to find those lines that are repeated but then i am not sure how to distinguish between the lines with the same name after that

FelipeAd

View Public Profile for FelipeAd

Find all posts by FelipeAd

12-04-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Once they are sorted you can read them line by line: as they are sorted already you can rely on all the identical key values coming one after the other. The underlying algorithm is a widely used and basic one and called: single group change and it works like this:

You have to remember your last key value. If the key value you read now is identical you are within the same group, so add the other values to sums or whatever you do within your groups.

If the key you read is not identical with the previous one you have to first end your last group - calculate any averages from the sums, etc. - then start with a new group.

Two things to take into account: when you read the first line your group changes (from "" to some value) but you should suppress group end-processing at this point, because otherwise you get a "ghost-group" with an empty key and all values zero/nil. Second, your last line will have to trigger a group change too, because otherwise the last group would not be processed.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

12-04-2012

Registered User

53, 0

Join Date: Jan 2011

Last Activity: 4 May 2015, 5:02 AM EDT

Posts: 53

Thanks Given: 6

Thanked 0 Times in 0 Posts

Sorry bakunin, I am not sure i understood what you said.
I found from a previous thread

HTML Code:

https://www.unix.com/shell-programming-scripting/121566-averaging-multiple-columns.html

that in order to find the average (after grep for name1) for a number of columns, the correct code would be:

Code:

 grep name1 file.txt|awk '{for (i=2;i<=NF;i++) s[i]+=$i}END{for(i=2;i in s;i++) printf("%.3f%c"),s[i]/NR,((i+1) in s) ?OFS:ORS}'

but it only prints the (average) number and not 'name 1' at the beginning of the line.
Can someone tell me how to do this?

I mean now the output is

Code:

avg1 avg2 avg3 avg4

but i want it to be

Code:

name1 avg1 avg2 avg3 avg4

FelipeAd

View Public Profile for FelipeAd

Find all posts by FelipeAd

UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Check first column - average second column based on a condition

Discussion started by: jacobs.smith

2. UNIX for Dummies Questions & Answers

To find similar items in a column

Discussion started by: XXLMMN

3. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Discussion started by: jackken007

4. Shell Programming and Scripting

Average values in a column based on range

Discussion started by: bhargavpbk88

5. Homework & Coursework Questions

Find the Maximum value and average of a column

Discussion started by: dstewie

6. Shell Programming and Scripting

Help with merge two file based on similar column content

Discussion started by: perl_beginner

7. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Discussion started by: seqbiologist

8. Shell Programming and Scripting

AWK: how to get average based on certain column

Discussion started by: shell123

9. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows?

Discussion started by: ks_reddy

10. Shell Programming and Scripting

Change names in a column based on the symbols in another column

Discussion started by: repinementer