Averaging Data From Multiple Columns, Using Header if Possible


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Averaging Data From Multiple Columns, Using Header if Possible
# 1  
Old 01-04-2011
Averaging Data From Multiple Columns, Using Header if Possible

Hi,

I have a file with multiple tab delimited columns and I would like to have the average of each column:
Code:
Iteration	Tree No	Lh	HMean	
1000	1	-78.834717	-78.834717	
1100	1	-77.991031	-78.624046	
1200	1	-79.416055	-78.761861	
1300	1	-79.280494	-78.968099	
1400	1	-82.846275	-80.808696	
1500	1	-81.167562	-81.306918	
1600	1	-81.217691	-81.290207	
1700	1	-80.269179	-81.241736

I have been trimming off the header row (using emacs and a cut, please let me know if there is a one liner for that!!!) and then just manually using awk to get the average of each column individually, e.g.:
Code:
awk '{sum+=$3} END { print "Average = ",sum/NR}' FILENAME

BUT...I wonder if there is a way to
a) get the average of each of the columns in one go and
b) use the headers to create a report that was something like:
Code:
Iteration	Tree No	Lh	HMean	
AVGC1	AVGC2	AVGC3	AVGC4

or in a single column format is fine, too:

Iteration
AVGC1
Tree No
AVGC2
...etc

In my final file, I will have many many more columns, and the number of columns will be different in the different files. Also, there will be no missing data cells. The file will be complete. Thanks for any suggestions!!!

All the best,

Mikey

Last edited by Scott; 01-04-2011 at 11:43 AM.. Reason: Code tags
# 2  
Old 01-04-2011
Is the "tree no" field significant? Assuming it is not:
Code:
awk '
   BEGIN { FS = "\t"; }
   NR == 1 { print; next; }
  { for (i = 1; i <= NF; i++) { sum[i] += $i; } }
  END {
    printf "\t";
    for (i = 1; i <= NF; i++) { printf "\t%.4g", sum[i] / (NR - 1); }
    print "";
  }
' inputfile...


Last edited by m.d.ludwig; 01-04-2011 at 11:35 AM.. Reason: improvements based on bartus11's posting
This User Gave Thanks to m.d.ludwig For This Post:
# 3  
Old 01-04-2011
Try:
Code:
awk 'NR==1;NR>1{for (i=1;i<=NF;i++){a[i]+=$i}}END{for (i=1;i<=NF;i++){printf a[i]/(NR-1)" "};printf "\n"}' file

This User Gave Thanks to bartus11 For This Post:
# 4  
Old 01-04-2011
Hi, Thanks!
Well, the first two columns, Iteration and Tree No are actually both not significant to me. But, I figured that whatever code would be easiest if it could be applied to all columns. Thanks for the code, but on the little fake data file I posted above, the output is


No
-80.1279
Lh
-79.9795

That does not take the average from the first two columns (which is fine), but it seems to take the 3rd and 4th word from the headers, not the third and fourth tab delim column, which would be Lh and HMean. Any suggestions for fine tuning?

Mikey

---------- Post updated at 10:40 AM ---------- Previous update was at 10:36 AM ----------

Quote:
Originally Posted by bartus11
Try:
Code:
awk 'NR==1;NR>1{for (i=1;i<=NF;i++){a[i]+=$i}}END{for (i=1;i<=NF;i++){printf a[i]/(NR-1)" "};printf "\n"}' file


Oh, that is amazing. I only need one sort of modification, if it is possible: can the numbers be tab delimited in the output? I suppose that I could split the report into two files, replace the spaces in the average line with tabs and then cat. But is there a more elegant way?

and THANKS!

---------- Post updated at 11:06 AM ---------- Previous update was at 10:40 AM ----------

OMG
I did it myself and added the TABS!
I am a self taught unix genius!
But seriously, THANK you for your help!

Here is my addition to your fab code:
Code:
awk 'NR==1;NR>1{for (i=1;i<=NF;i++){a[i]+=$i}}END{for (i=1;i<=NF;i++){printf a[i]/(NR-1)"\t"};printf "\n"}'


Last edited by radoulov; 01-04-2011 at 12:29 PM.. Reason: Code tags, please!
# 5  
Old 01-04-2011
Quote:
Originally Posted by mikey11415
Well, the first two columns, Iteration and Tree No are actually both not significant to me.
You can probably work it out yourself (now your a self taught unix genius, lol), but below in red are are the changes required to skip the first two columns:
Code:
awk 'NR==1;NR>1{for (i=3;i<=NF;i++){a[i]+=$i}}END{ printf "\t\t"; for (i=3;i<=NF;i++){printf a[i]/(NR-1)"\t"};printf "\n"}'

[/QUOTE]
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to print multiple required columns dynamically in a file using the header name?

Hi All, i am trying to print required multiple columns dynamically from a fie. But i am able to print only one column at a time. i am new to shell script, please help me on this issue. i am using below script awk -v COLT=$1 ' NR==1 { for (i=1; i<=NF; i++) { ... (2 Replies)
Discussion started by: balu1234
2 Replies

2. Shell Programming and Scripting

Gathering data in columns from multiple files

Hello world! I need to gather all the data from different folders and copy it in an one unique text file in columns format. Let me explain, letīs say "a, b, c" are 3 data files of thousands and thousands lines (in the reality, I have nearly one hundred). "a, b, c" are located in the folders... (5 Replies)
Discussion started by: user10600
5 Replies

3. Shell Programming and Scripting

awk : Filter a set of data to parse header line and last field of multiple same match.

Hi Experts, I have a data with multiple entry , I want to filter PKG= & the last column "00060110" or "00088150" in the output file: ############################################################################################### PKG= P8SDB :: VGS = vgP8SOra vgP8SDB1 vgP8S001... (5 Replies)
Discussion started by: rveri
5 Replies

4. Shell Programming and Scripting

averaging specific column of multiple files

Hi all, I'm needing help again on scripting. I have weekly files with 3 columns, and I need to do monthly averaging on the values on column 3, the file naming convention is as follows: 20000105.u- 2000:year 01:month 05:day 20000112.u 20000119.u 20000126.u 20000202.u 20020209.u I need to... (15 Replies)
Discussion started by: ida1215
15 Replies

5. Shell Programming and Scripting

Averaging 3 files with multiple rows

Hi, I am trying to average the values from 3 files with the same format. They are very large files so I will describe the file and show some it of. Basically the file has 83 columns (with nearly 7000 rows). The first three columns are the same for each file while the remaining 80 are values that... (1 Reply)
Discussion started by: kylle345
1 Replies

6. Shell Programming and Scripting

averaging columns

Hi, I have a file that has 201 columns (1 for the name of each row and the other 200 are values) I want to print the average for each column the file looks like this (as an example this only has 7 columns with values) 1 2 3 4 5 6 7 abr 5 6 7 1 2 4 5 hhr 2 1 3 4 ... (8 Replies)
Discussion started by: kylle345
8 Replies

7. Shell Programming and Scripting

Averaging data every 30 mins using AWK

A happy Monday to you all, I have a .csv file which contains data taken every 5 seconds. I want to average these 5 second data points into 30 minute averages! date co2 25/06/2011 08:04 8.31 25/06/2011 08:04 8.32 25/06/2011 08:04 8.33... (18 Replies)
Discussion started by: gd9629
18 Replies

8. Shell Programming and Scripting

Splitting data from one row as multiple columns

Hi I have a file containing some data as follows: 11-17-2010:13:26 64 4 516414 1392258 11-17-2010:13:26 128 4 586868 695603 11-17-2010:13:26 256 4 474937 1642294 11-17-2010:13:32 64 4 378715 1357066 11-17-2010:13:32 128 4 597981 1684006 ... (17 Replies)
Discussion started by: annazpereira
17 Replies

9. Shell Programming and Scripting

How to convert 2 column data into multiple columns based on a keyword in a row??

Hi Friends I have the following input data in 2 columns. SNo 1 I1 Value I2 Value I3 Value SNo 2 I4 Value I5 Value I6 Value I7 Value SNo 3 I8 Value I9 Value ............... ................ SNo N (1 Reply)
Discussion started by: ks_reddy
1 Replies

10. Shell Programming and Scripting

Averaging multiple columns

Hello, I am trying to average multiple columns simultaneously while skipping the first column. I am using this awk line to average one column awk '{sum+=$3} END { print "Average = ",sum/NR}' But I want to be able to do it for multiple columns while skipping the first column. There... (4 Replies)
Discussion started by: gisele_l
4 Replies
Login or Register to Ask a Question