Average across rows with a condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Average across rows with a condition
# 1  
Old 09-17-2014
Average across rows with a condition

Hi Friends,

My input file

Code:
Gene1 10 20 0
Gene2 5 0 15
Gene3 10 10 10
Gene4 5 0 0

If there is a zero for any gene in any column, I don't want that column to be considered which reduces the denominator value during average.

Here is my output

Code:
Gene1 10 20 0 10
Gene2 5 0 15 10
Gene3 10 10 10 10
Gene4 5 0 0 5

The last column is my average. I didn't try anything for this solution because I didn't know where to start. Sorry.
# 2  
Old 09-17-2014
This approach utilizes bash arrays:
Code:
while read gname val1 val2 val3
 do
  arr=()
  [ $val1 -ne 0 ] && arr=("${arr[@]}" "$val1")
  [ $val2 -ne 0 ] && arr=("${arr[@]}" "$val2")
  [ $val3 -ne 0 ] && arr=("${arr[@]}" "$val3")
  nbrofvals=${#arr[@]}
  tot=0
    for arrval in "${arr[@]}"
     do
      tot=$((tot + arrval))
    done
  avg=$((tot/nbrofvals))
echo "$gname $val1 $val2 $val3 $avg"
done <input

This User Gave Thanks to junior-helper For This Post:
# 3  
Old 09-17-2014
Quote:
Originally Posted by junior-helper
This approach utilizes bash arrays:
Code:
while read gname val1 val2 val3
 do
  arr=()
  [ $val1 -ne 0 ] && arr=("${arr[@]}" "$val1")
  [ $val2 -ne 0 ] && arr=("${arr[@]}" "$val2")
  [ $val3 -ne 0 ] && arr=("${arr[@]}" "$val3")
  nbrofvals=${#arr[@]}
  tot=0
    for arrval in "${arr[@]}"
     do
      tot=$((tot + arrval))
    done
  avg=$((tot/nbrofvals))
echo "$gname $val1 $val2 $val3 $avg"
done <input

Thanks for your time. But I see the following errors

Code:
line 4: [: 1415.37: integer expression expected
line 5: [: 0.05248: integer expression expected
line 6: [: 2.20281: integer expression expected
line 13: tot/nbrofvals: division by 0 (error token is "s")

FYI, the zeros in the input files can even occur in second column which is your val1. Thanks for your time.

---------- Post updated at 04:37 PM ---------- Previous update was at 04:06 PM ----------

I tried all possible zeros occurrence like this

Code:
gene1	1	2	3
gene2	0	0	3
gene3	0	9	0
gene4	8	0	0
gene5	5	5	0
gene6	0	6	6
gene7	9	9	0

And then wrote if else statements for all combinations

Code:
awk '{if($2!=0 && $3!=0 && $4!=0) {v=($2+$3+$4)/3; print $0"\t"v} else if($2==0 && $3==0 && $4!=0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2==0 && $3!=0 && $4==0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2!=0 && $3==0 && $4==0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2!=0 && $3!=0 && $4==0) {v=($2+$3+$4)/2; print $0"\t"v} else if($2==0 && $3!=0 && $4!=0) {v=($2+$3+$4)/2; print $0"\t"v} else if($2!=0 && $3!=0 && $4==0) {v=($2+$3+$4)/2; print $0"\t"v}}' input

output
Code:
gene1	1	2	3	2
gene2	0	0	3	3
gene3	0	9	0	9
gene4	8	0	0	8
gene5	5	5	0	5
gene6	0	6	6	6
gene7	9	9	0	9

# 4  
Old 09-17-2014
You're welcome.

OK, obviously you want to calculate average of floating point numbers, right?
I saw your input sample with whole numbers only and scripted something for that.

>the zeros in the input files can even occur in second column which is your val1
If you think of gname as val0, then val1 *is* the second column, thus the script would also handle a zero in the second column.
However, right now, it would fail if all three columns (col2, 3 and 4) were all zeros.

I suggest that you provide another few "real world" input lines and the expected output.
I'm not sure I can fix my code though Smilie, but it would be very useful for others.
This User Gave Thanks to junior-helper For This Post:
# 5  
Old 09-17-2014
Try this:-
Code:
awk '
        {
                for ( i = 2; i <= NF; i++ )
                {
                        V[$1] = $0
                        A[$1] += $i
                        T[$1] += ( $i == 0 ? 0 : 1 )
                }
        }
        END {
                for ( k in A )
                        print V[k], T[k] ? A[k] / T[k] : 0
        }
' file | sort

These 2 Users Gave Thanks to Yoda For This Post:
# 6  
Old 09-18-2014
Posted by jacobs.smith:
Quote:
Hi Friends,
My input file

Code:
Gene1 10 20 0Gene2 5 0 15Gene3 10 10 10Gene4 5 0 0

If there is a zero for any gene in any column, I don't want that column to be considered which reduces the denominator value during average.

Here is my output

Code:
Gene1 10 20 0 10Gene2 5 0 15 10Gene3 10 10 10 10Gene4 5 0 0 5

The last column is my average. I didn't try anything for this solution because I didn't know where to start. Sorry.
Hello jacobs.smith,

Following may help also.

Code:
 awk '{for(i=2;i<=NF;i++){if($i){A[NR]=A[NR]?A[NR]+$i:$i;B[NR]=B[NR]?B[NR]+1:1;}}} {p=f=NR;{for(r=NR;r<=NR;r++){c[p--]=A[r]/B[r]}}{for(t=f;t<=NR;t++){$0=$0 OFS c[t];print $0}}}'  Input_file

Output will be as follows.

Code:
Gene1 10 20 0 15
Gene2 5 0 15 10
Gene3 10 10 10 10
Gene4 5 0 0 5

EDIT: Adding a clear view of code for same.

Code:
awk '{
for(i=2;i<=NF;i++)
        {if($i)
                {A[NR]=A[NR]?A[NR]+$i:$i;B[NR]=B[NR]?B[NR]+1:1;}
        }
     }
{p=f=NR;
        {for(r=NR;r<=NR;r++)
                            {c[p--]=A[r]/B[r]}
        }
        {for(t=f;t<=NR;t++)
                            {$0=$0 OFS c[t];print $0}
        }
}' Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average select rows

I have no idea how to even get started with this script. I need to average field 3 for each of the unique identifiers found in field 1. However, I only want to average these rows when field 2 is equal to 1506 - 2000 (note that i replaced the values field 2 for security reasons, but the real... (6 Replies)
Discussion started by: ncwxpanther
6 Replies

2. Shell Programming and Scripting

Average values of duplicate rows

I have this file input.txt. I want to take average column-wise for the rows having duplicate gene names. Gene Sample_1 Sample_2 Sample_3 gene_A 2 4 5 gene_B 1 2 3 gene_A 0 5 7 gene_B 4 5 6 gene_A 11 12 13 gene_C 2 3 4 Desired output: gene_A 4.3 7 8.3 gene_B 2.5 3.5 4.5 gene_C 2 3 4... (6 Replies)
Discussion started by: Sanchari
6 Replies

3. UNIX for Dummies Questions & Answers

Writing a script to take the average of two columns every 3 rows

I have a dataset with 120 columns. I would like to write a script, that takes the average of every two columns, starting from columns 2 and 3, and moving consecutively in frames of 3 columns, all the way until the last column. The first column in the output file would be the averages of columns... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Calculate average of rows between two specific patterns

Hi, I have a file like this: variableStep chrom=chrX span=1 92328 0 92329 0 92330 0 92331 0 92332 0 92333 0 ................ ................ ................ variableStep chrom=chrX span=1 45649610 -0.00386 45649611 1.56 45649612 -2.23 45649613 ... (2 Replies)
Discussion started by: Ranajit_Das
2 Replies

5. Shell Programming and Scripting

average of rows with same value in the first column

Dear All, I have this file tab delimited A 1 12 22 B 3 34 33 C 55 9 32 A 12 81 71 D 11 1 66 E 455 4 2 B 89 4 3 I would like to make the average every column where the first column is the same, for example, A 6,5 46,5 46,5 B 46,0 19,0 18,0 C 55,0 9,0 32,0 D 11,0 1,0 66,0... (8 Replies)
Discussion started by: paolo.kunder
8 Replies

6. UNIX Desktop Questions & Answers

Calculate average for rows in a text file

Dear Gurus, I have tab-delimited text files with matrix containing values. The first column is a identifier and other columns have the corresponding values. I would like to calculate the average value (total number/number of entries) for all entries from 2nd column to the last column in row... (3 Replies)
Discussion started by: Unilearn
3 Replies

7. Shell Programming and Scripting

Average calculation based on number of rows

Dear users, I need your support, I have a file like this: 272134.548 6680572.715 272134.545 6680572.711 272134.546 6680572.713 272134.548 6680572.706 272134.545 6680572.721 272134.543 6680572.710 272134.544 6680572.715 272134.543 6680572.705 272134.540 6680572.720 272134.544... (10 Replies)
Discussion started by: Gery
10 Replies

8. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows?

Hi Friends, In continuation to my earlier post https://www.unix.com/shell-programming-scripting/99166-script-find-average-given-column-also-specified-number-rows.html I am extending my problem as follows. Input: Column1 Column2 MAS 1 MAS 4 ... (2 Replies)
Discussion started by: ks_reddy
2 Replies

9. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows??

Hi friends I have 100 files in my directory. Each file look like this.. Temp1 Temp2 Temp3 MAS 1 2 3 MAS 4 5 6 MAS 7 8 9 Delhi 10 11 12 Delhi 13 14 15 Delhi 16 17 ... (4 Replies)
Discussion started by: ks_reddy
4 Replies

10. UNIX for Dummies Questions & Answers

Calculating the Number of Rows and Average

Hi All I like to know how can we calculate the number of rows and the average of the values present in the file. I will not know what will be the rowcount, which will be dynamic in nature of the file. eg. 29 33 48 30 28 (6 Replies)
Discussion started by: pk_eee
6 Replies
Login or Register to Ask a Question