|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
Find the average based on similar names in the first column
I have a table, say this: Code:
name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according to the first column and for ALL columns with the same name, calculate the average of each of the corresponding columns. In this case it would be: Code:
name1 num1 num2 num3 num4 name2 avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4) name2 avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4) name2 avg(5,8,4) avg(6,9,5) avg(7,1,6) avg(8,2,4) name3 num1 num3 num4 num9 name4 num4 num5 num7 num8 name5 avg(1,6) avg(3,8) avg(9,3) avg(7,4) name5 avg(1,6) avg(3,8) avg(9,3) avg(7,4) |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
What have you tried? Where are you stuck?
|
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
I started by using sort command, in particular Code:
sort -d -k1,1 file to find those lines that are repeated but then i am not sure how to distinguish between the lines with the same name after that |
|
#4
|
|||
|
|||
|
Once they are sorted you can read them line by line: as they are sorted already you can rely on all the identical key values coming one after the other. The underlying algorithm is a widely used and basic one and called: single group change and it works like this:
You have to remember your last key value. If the key value you read now is identical you are within the same group, so add the other values to sums or whatever you do within your groups. If the key you read is not identical with the previous one you have to first end your last group - calculate any averages from the sums, etc. - then start with a new group. Two things to take into account: when you read the first line your group changes (from "" to some value) but you should suppress group end-processing at this point, because otherwise you get a "ghost-group" with an empty key and all values zero/nil. Second, your last line will have to trigger a group change too, because otherwise the last group would not be processed. I hope this helps. bakunin |
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Sorry bakunin, I am not sure i understood what you said. I found from a previous thread HTML Code:
http://www.unix.com/shell-programming-scripting/121566-averaging-multiple-columns.html Code:
grep name1 file.txt|awk '{for (i=2;i<=NF;i++) s[i]+=$i}END{for(i=2;i in s;i++) printf("%.3f%c"),s[i]/NR,((i+1) in s) ?OFS:ORS}'but it only prints the (average) number and not 'name 1' at the beginning of the line. Can someone tell me how to do this? I mean now the output is Code:
avg1 avg2 avg3 avg4 but i want it to be Code:
name1 avg1 avg2 avg3 avg4 |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Average values in a column based on range | bhargavpbk88 | Shell Programming and Scripting | 5 | 03-23-2012 06:03 PM |
| Help with merge two file based on similar column content | perl_beginner | Shell Programming and Scripting | 5 | 05-20-2011 09:18 AM |
| Joining multiple files based on one column with different and similar values (shell or perl) | seqbiologist | Shell Programming and Scripting | 4 | 04-28-2011 05:00 AM |
| AWK: how to get average based on certain column | shell123 | Shell Programming and Scripting | 3 | 03-24-2011 06:16 AM |
| Change names in a column based on the symbols in another column | repinementer | Shell Programming and Scripting | 8 | 08-14-2009 03:30 AM |
|
|