Average across multiple columns group by Post: 302932980

Sponsored Content

Top Forums Shell Programming and Scripting Average across multiple columns group by Post 302932980 by Don Cragun on Monday 26th of January 2015 07:18:56 PM

01-26-2015

Registered User

Quote:

Originally Posted by ritakadm

sorry about not providing with a sample input, I`m using cygwin. The data range is 0 to 100,000 should be outputted upto 2 decimal places.
Data is 83000 lines, not very big.

Yes, the code calculates the average correctly for only the 7th column, although I should
populate arr as arr[$1" "$2" "$3" "$4" "$5] to get all the variables delimited.

Sample input has 4 data columns, I have many in the original data starting col7 until NF ($22).

Code:

  a1 b1 c1 d1 e1 12 13 14 15
  a1 b1 c1 d1 e1 14 15 16 17
  a1 b1 c1 d1 e1 13 14 15 16
  a2 b1 c1 d1 e1 112 113 114 115
  a2 b1 c1 d1 e1 114 115 116 117
  a2 b1 c1 d1 e1 113 114 115 116

Output should be

Code:

  a1 b1 c1 d1 e1 13 14 15 16
  a2 b1 c1 d1 e1 113 114 115 116

The code you showed us in your 1st post in this thread skips data in the 1st line of your file (which I assumed was intended to skip over a header line). But, I don't see any headers in this sample. Is there a header, or not? If there s a header, should it be copied to the output?

Is the number of fields constant in an input file, or can it vary from line to line?

It looks like there is a leading space in your sample input and output. Is a leading space required in your output?

Do you want 2 decimal places in all computed output fields, or do you want values to be printed without decimal places (as in your sample output) in cases where the computed result is an integral value?

You say you want to calculate averages for fields 7 through NF, but your sample data also calculates the average for field 6? Is field 6 supposed to be ignored in calculations and removed from the output, or is field 6 to be averaged as well as fields 7 through NF?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Taking the average of two columns and printing it on a new column

Hi, I have a space delimited text file that looks like the following: Aa 100 200 Bb 300 100 Cc X 500 Dd 600 X Basically, I want to take the average of columns 2 and 3 and print it in column 4. However if there is an X in either column 2 or 3, I want to print the non-X value. Therefore...

2. Shell Programming and Scripting

Average of columns with values of other column with same name

I have a lot of input files that have the following form: Sample Cq Sample Cq Sample Cq Sample Cq Sample Cq 1WBIN 23.45 1WBIN 23.45 1CVSIN 23.96 1CVSIN 23.14 S1 31.37 1WBIN 23.53 1WBIN 23.53 1CVSIN 23.81 1CVSIN 23.24 S1 31.49 1WBIN 24.55 1WBIN 24.55 1CVSIN 23.86 1CVSIN 23.24 S1 31.74 ...

3. Shell Programming and Scripting

How to calculate average of two columns and copy into another file?

Hi, I need help with the awk command. I have a folder with aprox 500 files each one with two columns and I want to print in a new file, the average of column 1 and average of column 2 and the name of each file. Input files are: File-1: 100 99 20 99 50 99 50 99 File-2: 200 85...

4. Shell Programming and Scripting

Get the SUM of TWO columns SEPARATELY by doing GROUP BY on other columns

My File looks like: "|" -> Field separator A|B|C|100|1000 D|E|F|1|2 G|H|I|0|7 D|E|F|1|2 A|B|C|10|10000 G|H|I|0|7 A|B|C|1|100 D|E|F|1|2 I need to do a SUM on Col. 5 and Col.6 by grouping on Col 1,2 & 3 My expected output is: A|B|C|111|11100

5. Shell Programming and Scripting

Match first two columns and average third from multiple files

I have the following format of input from multiple files File 1 24.01 -81.01 1.0 24.02 -81.02 5.0 24.03 -81.03 0.0 File 2 24.01 -81.01 2.0 24.02 -81.02 -5.0 24.03 -81.03 10.0 I need to scan through the files and when the first 2 columns match I...

6. Emergency UNIX and Linux Support

Average columns based on header name

Hi Friends, I have files with columns like this. This sample input below is partial. Please check below for main file link. Each file will have only two rows. ...

7. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","...

8. UNIX for Beginners Questions & Answers

Group by columns and add sum in new columns

Dear Experts, I have input file which is comma separated, has 4 columns like below, BRAND,COUNTRY,MODEL,COUNT NIKE,USA,DUMMY,5 NIKE,USA,ORIGINAL,10 PUMA,FRANCE,DUMMY,20 PUMA,FRANCE,ORIGINAL,15 ADIDAS,ITALY,DUMMY,50 ADIDAS,ITALY,ORIGINAL,50 SPIKE,CHINA,DUMMY,1O And expected output add...

9. Shell Programming and Scripting

Average of a columns from three files

hello, I have three files in the following order ==> File1 <== 1 20977000 20977000 A C 1.00 0,15 15 45 1 115829313 115829313 G A 0.500 6,7 13 99 ==> File2 <== 1 20977000 20977000 A C 1.00 0,13 13 39 1 115829313 ...

10. UNIX for Beginners Questions & Answers

Average of columns

I have files that have the following columns chr pos ref alt sample 1 sample 2 sample 3 chr2 179644035 G A 1,107 0,1 58,67 chr7 151945167 G T 142,101 100,200 500,700 chr13 31789169 CTT CT,C 6,37,8 0,0,0 15,46,89 chr22 ...

LEARN ABOUT MOJAVE

column

COLUMN(1)						    BSD General Commands Manual 						 COLUMN(1)

NAME

     column -- columnate lists

SYNOPSIS

     column [-tx] [-c columns] [-s sep] [file ...]

DESCRIPTION

     The column utility formats its input into multiple columns.  Rows are filled before columns.  Input is taken from file operands, or, by
     default, from the standard input.	Empty lines are ignored.

     The options are as follows:

     -c      Output is formatted for a display columns wide.

     -s      Specify a set of characters to be used to delimit columns for the -t option.

     -t      Determine the number of columns the input contains and create a table.  Columns are delimited with whitespace, by default, or with
	     the characters supplied using the -s option.  Useful for pretty-printing displays.

     -x      Fill columns before filling rows.

ENVIRONMENT

     The COLUMNS, LANG, LC_ALL and LC_CTYPE environment variables affect the execution of column as described in environ(7).

EXIT STATUS

     The column utility exits 0 on success, and >0 if an error occurs.

EXAMPLES

	   (printf "PERM LINKS OWNER GROUP SIZE MONTH DAY " ; 
	   printf "HH:MM/YEAR NAME
" ; 
	   ls -l | sed 1d) | column -t

SEE ALSO

     colrm(1), ls(1), paste(1), sort(1)

HISTORY

     The column command appeared in 4.3BSD-Reno.

BUGS

     Input lines are limited to LINE_MAX (2048) bytes in length.

BSD
								   July 29, 2004							       BSD