Average columns based on header name

03-17-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Average columns based on header name

Hi Friends,

I have files with columns like this. This sample input below is partial.

Please check below for main file link. Each file will have only two rows.

Code:

Gene	0.4%	0.7%	1.1%	1.4%	1.8%	2.2%	2.5%	2.9%	3.3%	3.6%	4.0%	4.3%	4.7%	5.1%	5.4%	5.8%	6.2%	6.5%	6.9%	7.2%	7.6%	8.0%	8.3%	8.7%	9.1%	9.4%	9.8%	10.1%	10.5%	10.9%	11.2%	11.6%	12.0%	12.3%	12.7%	13.0%	13.4%	13.8%	14.1%	14.5%	14.9%	15.2%	15.6%	15.9%	16.3%	16.7%	17.0%	17.4%	17.8%	18.1%	18.5%	18.8%	19.2%	19.6%	19.9%	20.3%	20.7%	21.0%	21.4%	21.7%	22.1%	22.5%	22.8%	23.2%	23.6%	23.9%	24.3%	24.6%	25.0%	25.4%	25.7%	26.1%	26.4%	26.8%	27.2%	27.5%	27.9%	28.3%	28.6%	29.0%	29.3%	29.7%	30.1%	30.4%	30.8%	31.2%	31.5%	31.9%	32.2%	32.6%	33.0%	33.3%	33.7%	34.1%	34.4%	34.8%	35.1%	35.5%	35.9%	36.2%	36.6%	37.0%	37.3%	37.7%	38.0%	38.4%	38.8%	39.1%	39.5%	39.9%	40.2%	40.6%	40.9%	41.3%	41.7%	42.0%	42.4%	42.8%	43.1%	43.5%	43.8%	44.2%	44.6%	44.9%	45.3%	45.7%	46.0%	46.4%	46.7%	47.1%	47.5%	47.8%	48.2%	48.6%	48.9%	49.3%	49.6%	50.0%	50.4%	50.7%	51.1%	51.4%	51.8%	52.2%	52.5%	52.9%	53.3%	53.6%	54.0%	54.3%	54.7%	55.1%	55.4%	55.8%	56.2%	56.5%	56.9%	57.2%	57.6%	58.0%	58.3%	58.7%	59.1%	59.4%	59.8%	60.1%	60.5%	60.9%	61.2%	61.6%	62.0%	62.3%	62.7%	63.0%	63.4%	63.8%	64.1%	64.5%	64.9%	65.2%	65.6%	65.9%	66.3%	66.7%	67.0%	67.4%	67.8%	68.1%	68.5%	68.8%	69.2%	69.6%	69.9%	70.3%	70.7%	71.0%	71.4%	71.7%	72.1%	72.5%	72.8%	73.2%	73.6%	73.9%	74.3%	74.6%	75.0%	75.4%	75.7%	76.1%	76.4%	76.8%	77.2%	77.5%	77.9%	78.3%	78.6%	79.0%	79.3%	79.7%	80.1%	80.4%	80.8%	81.2%	81.5%	81.9%	82.2%	82.6%	83.0%	83.3%	83.7%	84.1%	84.4%	84.8%	85.1%	85.5%	85.9%	86.2%	86.6%	87.0%	87.3%	87.7%	88.0%	88.4%	88.8%	89.1%	89.5%	89.9%	90.2%	90.6%	90.9%	91.3%	91.7%	92.0%	92.4%	92.8%	93.1%	93.5%	93.8%	94.2%	94.6%	94.9%	95.3%	95.7%	96.0%	96.4%	96.7%	97.1%	97.5%	97.8%	98.2%	98.6%	98.9%	99.3%	99.6%	100.0%	0.4%	0.7%	1.1%	1.4%	1.8%	2.2%	2.5%	2.9%	3.3%	3.6%	4.0%	4.3%	4.7%	5.1%	5.4%	5.8%	6.2%	6.5%	6.9%	7.2%	7.6%	8.0%	8.3%	8.7%	9.1%	9.4%	9.8%	10.1%	10.5%	10.9%	11.2%	11.6%	12.0%	12.3%	12.7%	13.0%	13.4%	13.8%	14.1%	14.5%	14.9%	15.2%	15.6%	15.9%	16.3%	16.7%	17.0%	17.4%	17.8%	18.1%	18.5%	18.8%	19.2%	19.6%	19.9%	20.3%	20.7%	21.0%	21.4%	21.7%	22.1%	22.5%	22.8%	23.2%	23.6%	23.9%	24.3%	24.6%	25.0%	25.4%	25.7%	26.1%	26.4%	26.8%	27.2%	27.5%	27.9%	28.3%	28.6%	29.0%	29.3%	29.7%	30.1%	30.4%	30.8%	31.2%	31.5%	31.9%	32.2%	32.6%	33.0%	33.3%	33.7%	34.1%	34.4%	34.8%	35.1%	35.5%	35.9%	36.2%	36.6%	37.0%	37.3%	37.7%	38.0%	38.4%	38.8%	39.1%	39.5%	39.9%	40.2%	40.6%	40.9%	41.3%	41.7%	42.0%	42.4%	42.8%	43.1%	43.5%	43.8%	44.2%	44.6%	44.9%	45.3%	45.7%	46.0%	46.4%	46.7%	47.1%	47.5%	47.8%	48.2%	48.6%	48.9%	49.3%	49.6%	50.0%	50.4%	50.7%	51.1%	51.4%	51.8%	52.2%	52.5%	52.9%	53.3%	53.6%	54.0%	54.3%	54.7%	55.1%	55.4%	55.8%	56.2%	56.5%	56.9%	57.2%	57.6%	58.0%	58.3%	58.7%	59.1%	59.4%	59.8%	60.1%	60.5%	60.9%	61.2%	61.6%	62.0%	62.3%	62.7%	63.0%	63.4%	63.8%	64.1%	64.5%	64.9%	65.2%	65.6%	65.9%	66.3%	66.7%	67.0%	67.4%	67.8%	68.1%	68.5%	68.8%	69.2%	69.6%	69.9%	70.3%	70.7%	71.0%	71.4%	71.7%	72.1%	72.5%	72.8%	73.2%	73.6%	73.9%	74.3%	74.6%	75.0%	75.4%	75.7%	76.1%	76.4%	76.8%	77.2%	77.5%	77.9%	78.3%	78.6%	79.0%	79.3%	79.7%	80.1%	80.4%	80.8%	81.2%	81.5%	81.9%	82.2%	82.6%	83.0%	83.3%	83.7%	84.1%	84.4%	84.8%	85.1%	85.5%	85.9%	86.2%	86.6%	87.0%	87.3%	87.7%	88.0%	88.4%	88.8%	89.1%	89.5%	89.9%	90.2%	90.6%	90.9%	91.3%	91.7%	92.0%	92.4%	92.8%	93.1%	93.5%	93.8%	94.2%	94.6%	94.9%	95.3%	95.7%	96.0%	96.4%	96.7%	97.1%	97.5%	97.8%	98.2%	98.6%	98.9%	99.3%	99.6%	100.0%

Basically, here is what I need to be done.

a. Start from second column which is 0.4% here.
b. Go until you hit "10" in the header name. If the header name is exactly 10.0%, then include that column too. If not, only include until the column before it. In this example, since we have 10.1% (29th column), we will be including columns starting from 0.4%(second) until 9.8% which is the 28th column. If the 29th column was to be 10.0%, then it would have been included too.
c. Average the values for these respective columns in the second row (data is not presented here - please click this link for total dataset - https://goo.gl/W8jND7). In this example, starting from 0.4%(second column) till 9.8%(28th column).
d. In the output, print first column which is "Gene", and this average value with column header being

Code:

Gene Average_10%

e. Then start from 10.1% (29th column) and check until you hit "20" in the header name. Repeat steps b through d. And print output as

Code:

Gene Average_10% Average_20%

Repeat this until you have

Code:

Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%

f. After you hit 100%, it means one dataset is done.

g. If you observe my column header carefully here, there is another 0.4%-100% columns after the first 100%. I will be having 13 of these 0.4%-100%s in the input file at the above link.

i. I have multiple files, the headers can be

Code:

1% 2% 3%....100%

1.5% 2.5% 3.5%....100%

It varies from file to file. But the logic of averaging(if you hit "10", "20", etc) is always the same. And the number of samples 13 is also same which means each file will have 100%s for 13 times.

P.S: A Bonus of 1000 bits will be awarded to the effectively working solution.

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

03-17-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I do not understand what output you are trying to produce.

For each input gene line do you want:

one output line with the gene number and 10 averages from all 13 samples,
thirteen output lines with the gene number and the 10 averages from one sample on each line, or
one output line with the gene number and 130 averages where each set of 10 averages comes from one sample?

Can you show us the exact output you're hoping to produce from the data provided in your sample input for genes 1 and 2?

Was the data for gene 3 in your sample truncated, or will some inputs have missing fields that should be treated as zero values when computing the averages?

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

03-17-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by Don Cragun

I do not understand what output you are trying to produce.

For each input gene line do you want:

one output line with the gene number and 10 averages from all 13 samples,
thirteen output lines with the gene number and the 10 averages from one sample on each line, or
one output line with the gene number and 130 averages where each set of 10 averages comes from one sample?

Hi Don,

Thank you for your response. Good questions and I am glad at least you replied.

Thank you so much.

Here are the answers for your questions.

I want "one output line with the gene number and 130 averages where each set of 10 averages comes from one sample"

Since the data is very big, I will use the below small example assuming I have only one sample.

Code:

cat input_example1
Gene 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20%....30%....40%....50%....60%....70%....80%....90%..100%
Gene 1 2 3 4 5 4 3 2 1 1 22.5 2 3.5 3 3 3 4 5 6 7...1..2..3..5..9..11..134.33...0.1

Code:

cat output_example1
Gene Average_10% Average_20% ..........Average_100%
Gene 2.6 5.9..............x

Here the 2.6 is coming from averaging 1 in second column through 1 in 11th column. 5.9 is coming from averaging from 22.5(12th column) till 7 (21st column). Please remember that if the 11th column was to be any value greater than 10.0% like 10.1% or 10.8% or anything, then we will be averaging only until the 10th column.

The main input file from the link that I gave in my earlier post, has 13 samples' 1% to 100% values and only two rows. In the output file, we will have Gene column plus 130 average values. Makes sense?

There will be no gene3 in an input file. It will always be 2 rows (first line is header and second line is the values for averaging) with 13 sample being chopped into different varying percentages. There will be no missing fields. For every column header, there is a value associated with it in the input file.

Also I have a batch of files like 4000. I give a folder with the extension, and the script should be reading each file and following the averaging conditions of differentiating exact 10.0% (to consider for average) and any other values greater than 10.0% like 10.1% or 10.2% or 10.3% etc (to not consider this column and the column before it).

Please ask as many questions as possible and I will be glad to answer.

Coming to what I have tried so far, I have been trying to read the headers and print each set into a different file and then do the computation and put it back and then move to the other sample. This seems to be very time taking.

All your time and understanding is highly appreciated.

Thanks in advance

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

03-17-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

This is a hell of a problem, by sheer data volume. I'm brain dead now; can't verify the results are correct although I think/hope they are. Try

Code:

awk '
NR == 1         {print "Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%"\
                        "       Average_70%     Average_80%     Average_90%     Average_100%"
                 ST = B
                 i = 2
                 gsub ("%", _)
                 while (i <= NF)        {while ($i < ST) i++;
                                         if ($i == ST)  i++
                                         L[++C] = i

                                         ST += B
                                         if (ST > 100)  {H[++D] = C - 1
                                                         ST = B
                                                        }
#                                                               print i, $i, ST, L[C], $L[C], D, H[D]
                                        }
                 next
                }

                {printf "%s", $1
                 i = 0
                 D = 1
                 while (i < C)  {SUM = 0
#                                                               print "--->", i, L[i], L[i+1]
                                 for (j=L[i]; j<L[i+1]; j++) SUM+=$j
                                 printf "%s%8.5f", OFS, SUM
                                 if (i == H[D]) {printf RS
                                                 D++
                                                }
                                 i++
                                }
                }
' FS="\t" OFS="\t" B=10 /tmp/gene.txt
Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%     Average_70%     Average_80%     Average_90%     Average_100%
Gene1    0.23096         0.08151         0.33964         0.32606         0.51626         0.13586         0.29889         0.05434         0.29889         0.82873
         0.36887         0.50061         0.17126         0.40840         0.31618         0.14491         0.55331         0.39522         0.30300         0.65870
         0.44471         0.38337         0.21469         0.33737         0.03067         0.42938         0.23002         0.21469         0.19935         0.49072
         0.54426         0.54426         0.21239         0.19912         0.51771         0.45134         0.59736         0.46461         0.18584         1.08852
         0.28180         0.75640         0.10382         0.28180         0.44494         0.29663         0.29663         0.53393         0.40045         0.65258
         0.18539         0.63033         0.18539         0.42022         0.28427         0.39550         0.45730         0.40786         0.45730         1.17415
         0.42479         0.70355         0.11947         0.41151         0.42479         0.47789         0.41151         0.43806         0.54426         0.90267
         0.25776         0.50380         0.14060         0.17574         0.65611         0.26947         0.29291         0.42179         0.37492         0.31634
         0.27312         0.27312         0.09754         0.07803         0.29263         0.48772         1.19003         0.66330         0.54624         0.62428
         0.31595         0.50025         0.14481         0.21063         0.11848         0.63189         0.40810         0.22379         0.43442         0.96100
         0.13974         0.52790         0.20185         0.52790         0.24842         0.63659         0.34158         0.65212         0.55896         1.08686
         0.69134         0.41107         0.22422         0.33633         0.33633         0.22422         0.61660         0.39238         0.20553         0.67266
         0.44164         0.41010         0.38907         0.39958         0.29443         0.73607         0.70453         0.86226         0.71504         0.70453

Please do some validation and come back with its result.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-17-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by RudiC

This is a hell of a problem, by sheer data volume. I'm brain dead now; can't verify the results are correct although I think/hope they are. Try

Code:

awk '
NR == 1         {print "Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%"\
                        "       Average_70%     Average_80%     Average_90%     Average_100%"
                 ST = B
                 i = 2
                 gsub ("%", _)
                 while (i <= NF)        {while ($i < ST) i++;
                                         if ($i == ST)  i++
                                         L[++C] = i

                                         ST += B
                                         if (ST > 100)  {H[++D] = C - 1
                                                         ST = B
                                                        }
#                                                               print i, $i, ST, L[C], $L[C], D, H[D]
                                        }
                 next
                }

                {printf "%s", $1
                 i = 0
                 D = 1
                 while (i < C)  {SUM = 0
#                                                               print "--->", i, L[i], L[i+1]
                                 for (j=L[i]; j<L[i+1]; j++) SUM+=$j
                                 printf "%s%8.5f", OFS, SUM
                                 if (i == H[D]) {printf RS
                                                 D++
                                                }
                                 i++
                                }
                }
' FS="\t" OFS="\t" B=10 /tmp/gene.txt
Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%     Average_70%     Average_80%     Average_90%     Average_100%
Gene1    0.23096         0.08151         0.33964         0.32606         0.51626         0.13586         0.29889         0.05434         0.29889         0.82873
         0.36887         0.50061         0.17126         0.40840         0.31618         0.14491         0.55331         0.39522         0.30300         0.65870
         0.44471         0.38337         0.21469         0.33737         0.03067         0.42938         0.23002         0.21469         0.19935         0.49072
         0.54426         0.54426         0.21239         0.19912         0.51771         0.45134         0.59736         0.46461         0.18584         1.08852
         0.28180         0.75640         0.10382         0.28180         0.44494         0.29663         0.29663         0.53393         0.40045         0.65258
         0.18539         0.63033         0.18539         0.42022         0.28427         0.39550         0.45730         0.40786         0.45730         1.17415
         0.42479         0.70355         0.11947         0.41151         0.42479         0.47789         0.41151         0.43806         0.54426         0.90267
         0.25776         0.50380         0.14060         0.17574         0.65611         0.26947         0.29291         0.42179         0.37492         0.31634
         0.27312         0.27312         0.09754         0.07803         0.29263         0.48772         1.19003         0.66330         0.54624         0.62428
         0.31595         0.50025         0.14481         0.21063         0.11848         0.63189         0.40810         0.22379         0.43442         0.96100
         0.13974         0.52790         0.20185         0.52790         0.24842         0.63659         0.34158         0.65212         0.55896         1.08686
         0.69134         0.41107         0.22422         0.33633         0.33633         0.22422         0.61660         0.39238         0.20553         0.67266
         0.44164         0.41010         0.38907         0.39958         0.29443         0.73607         0.70453         0.86226         0.71504         0.70453

Please do some validation and come back with its result.

Rudic,

You are one piece of a beast! The solution is rocking!

Two quick requests.

1. You are printing the sum and not averages. I just checked and your script is working fine. I am scared to touch this script. Can I print SUM/NR to get the average?

2. Can you make this oneliner? Please...please...please.

Again, I can make it. But I don't want to F*** this up. A BIG THANKS MY FRIEND.

When Don asked me how to print the output file, I asked him to print me 130 columns, but your solution seems to be perfectly suitable to my further downstream analysis.

Thank you so much.

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

03-17-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

As I said, I'm braindead. I missed the average, sorry. Try printing SUM/(L[i+1]-L[i]). That's not OK for the very first entry as the first field is summed, too, as its value is zero, so the element count is too high. But - I'm scared to touch it, either.

No one liner. Whereever a oneliner can be accommodated, a multiliner will fit, too. Use a function. Source a file.

---------- Post updated at 22:40 ---------- Previous update was at 22:28 ----------

Try this one - the field 1 error should be removed. Please perform a careful validation of the results and come back:

Code:

awk '
NR == 1         {print "Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%"\
                        "       Average_70%     Average_80%     Average_90%     Average_100%"
                 ST   = B
                 i    = 2
                 C    = 1
                 L[C] = i
                 gsub ("%", _)
                 while (i <= NF)        {while ($i < ST) i++ 
                                         if ($i == ST)   i++
                                         L[++C] = i

                                         ST += B
                                         if (ST > 100)  {H[++D] = C - 1
                                                         ST = B
                                                        }
#                                                               print i, $i, ST, L[C], $L[C], D, H[D]
                                        }
                 next
                }

                {printf "%s", $1
                 i = D = 1
                 while (i < C)  {SUM = 0
#                                                               print "--->", i, L[i], L[i+1]
                                 for (j=L[i]; j<L[i+1]; j++) SUM+=$j
                                 printf "%s%8.5f", OFS, SUM / (L[i+1]-L[i])   
                                 if (i == H[D]) {printf RS
                                                 D++
                                                }   
                                 i++
                                }   
                }
' FS="\t" OFS="\t" B=10 /tmp/gene.txt
Gene    Average_10%     Average_20%     Average_30%     Average_40%     Average_50%     Average_60%     Average_70%     Average_80%     Average_90%     Average_100%
Gene1    0.00855         0.00291         0.01258         0.01164         0.01844         0.00503         0.01067         0.00201         0.01067         0.02960
         0.01366         0.01788         0.00634         0.01459         0.01129         0.00537         0.01976         0.01464         0.01082         0.02353
         0.01647         0.01369         0.00795         0.01205         0.00110         0.01590         0.00822         0.00795         0.00712         0.01753
         0.02016         0.01944         0.00787         0.00711         0.01849         0.01672         0.02133         0.01721         0.00664         0.03888
         0.01044         0.02701         0.00385         0.01006         0.01589         0.01099         0.01059         0.01978         0.01430         0.02331
         0.00687         0.02251         0.00687         0.01501         0.01015         0.01465         0.01633         0.01511         0.01633         0.04193
         0.01573         0.02513         0.00442         0.01470         0.01517         0.01770         0.01470         0.01622         0.01944         0.03224
         0.00955         0.01799         0.00521         0.00628         0.02343         0.00998         0.01046         0.01562         0.01339         0.01130
         0.01012         0.00975         0.00361         0.00279         0.01045         0.01806         0.04250         0.02457         0.01951         0.02230
         0.01170         0.01787         0.00536         0.00752         0.00423         0.02340         0.01457         0.00829         0.01552         0.03432
         0.00518         0.01885         0.00748         0.01885         0.00887         0.02358         0.01220         0.02415         0.01996         0.03882
         0.02561         0.01468         0.00830         0.01201         0.01201         0.00830         0.02202         0.01453         0.00734         0.02402
         0.01636         0.01465         0.01441         0.01427         0.01052         0.02726         0.02516         0.03194         0.02554         0.02516

---------- Post updated at 23:39 ---------- Previous update was at 22:40 ----------

Actually, transposing the file like

Code:

awk '{gsub ("\t", RS); print > "FILE" NR}' gene.txt
paste FILE1 FILE2

simplified the logics seriously. Piping the paste result into

Code:

|  awk '
#NR > 1000      {exit}
NR == 1         {print; ST = B; F = 2; next}
$1+0 < ST       {SUM+=$2; next}
$1+0 == ST      {SUM+=$2; getline}
                {print ST, SUM, SUM/(NR-F)
                 SUM = $2
                 ST = ST%100 + B
                 F = NR}
' B=10
Gene    Gene1
10 0.230957 0.00855398
20 0.0815144 0.00291123
30 0.339643 0.0125794
40 0.326058 0.0116449
50 0.516258 0.0184378
60 0.135857 0.00503175
70 0.298886 0.0106745
80 0.0543429 0.0020127
90 0.298886 0.0106745
100 0.82873 0.0295975
10 0.368873 0.013662
20 0.500614 0.0178791
30 0.171263 0.00634306
40 0.408395 0.0145855
50 0.316177 0.011292
60 0.144914 0.0053672
70 0.55331 0.0197611
80 0.395221 0.0146378
90 0.303003 0.0108215
100 0.658702 0.0235251
10 0.444711 0.0164708
20 0.383372 0.0136918
30 0.214688 0.00795141
40 0.337367 0.0120488
50 0.0306697 0.00109535
60 0.429376 0.0159028
70 0.230023 0.00821511
80 0.214688 0.00795141
90 0.199353 0.00711976
100 0.490716 0.0175256
10 0.544259 0.0201577
20 0.544259 0.0194378
30 0.212394 0.00786644
40 0.199119 0.0071114
50 0.51771 0.0184896
60 0.451337 0.0167162
70 0.597357 0.0213342
80 0.464611 0.0172078
90 0.185845 0.0066373
100 1.08852 0.0388756
10 0.281795 0.0104369
20 0.756398 0.0270142
30 0.103819 0.00384516
40 0.281795 0.0100641
50 0.44494 0.0158907
60 0.296627 0.0109862
70 0.296627 0.0105938
80 0.533928 0.0197751
90 0.400446 0.0143016
100 0.652579 0.0233064
.
.
.

gives you an idea - results may still need a bit polishing.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-17-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Hi Rudic,

For some of the files, I am seeing this error with the first long code

Code:

awk: division by zero
 input record number 2, file 
 source line number 27

Any thoughts?

But the paste tweak is perfect!

Last edited by jacobs.smith; 03-17-2016 at 10:59 PM..

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

Emergency UNIX and Linux Support

Average columns based on header name

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Average of columns

Discussion started by: nans

2. Shell Programming and Scripting

Average of a columns from three files

Discussion started by: nans

3. Shell Programming and Scripting

Find columns in a file based on header and print to new file

Discussion started by: LMHmedchem

4. UNIX for Beginners Questions & Answers

Keep only columns in first two rows based on partial header pattern.

Discussion started by: aachave1

5. Shell Programming and Scripting

Average across multiple columns group by

Discussion started by: ritakadm

6. Shell Programming and Scripting

Make copy of text file with columns removed (based on header)

Discussion started by: LMHmedchem

7. Shell Programming and Scripting

Extract columns based on header

Discussion started by: aec

8. Shell Programming and Scripting

Average, min and max in file with header, using awk

Discussion started by: kayakj

9. Shell Programming and Scripting

Average of columns with values of other column with same name

Discussion started by: isildur1234

10. Shell Programming and Scripting

awk based script to find the average of all the columns in a data file

Discussion started by: ks_reddy