Check first column - average second column based on a condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Check first column - average second column based on a condition
# 1  
Old 09-16-2014
Check first column - average second column based on a condition

Hi,

My input file

Code:
Gene1 1
Gene1 2
Gene1 3
Gene1 0
Gene2 0
Gene2 0
Gene2 4
Gene2 8
Gene3 9
Gene3 9
Gene4 0

Condition:
Code:
If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging.

Output:

Code:
Gene1 2 (3+2+1/3. Because the fourth value is zero)
Gene2 6
Gene3 9
Gene4 0

Thanks
# 2  
Old 09-16-2014
Can you also post your attempts at solving this problem?
# 3  
Old 09-16-2014
Code:
sort infile |
        awk '{arr[$1]+=$2; cnt[$1]+=($2>0 )? 1: 0 ; next}
         END (for(i in arr) { printf( "%s  %f\n", i, arr[i]/cnt[i]) } > newfile

Try that.
This User Gave Thanks to jim mcnamara For This Post:
# 4  
Old 09-16-2014
Quote:
Originally Posted by Yoda
Can you also post your attempts at solving this problem?
Yoda,

Thanks for the response. I was thinking too complex on how to ignore zeros. But I figured it out like this - Smilie

Please suggest on how to calculate the average rather than the sum

Code:
awk '{if($2!=0) {print $0}}' input > temp && mv temp input

cat input

Gene1 1
Gene1 2
Gene1 3
Gene2 4
Gene2 8
Gene3 9
Gene3 9

Code:
awk '{a[$1]+=$2}END{for (i in a) print i,a[i]}' input

Gene1 6
Gene2 12
Gene3 18

Thanks
# 5  
Old 09-16-2014
An awk approach:-
Code:
awk '
        {
                A[$1] += $2
                T[$1] += ( $2 == 0 ? 0 : 1 )
        }
        END {
                for ( k in A )
                        print k, ( T[k] ? A[k] / T[k] : 0 )
        }
' file

Use sort if you want to sort the results by first column:-
Code:
' file | sort

This User Gave Thanks to Yoda For This Post:
# 6  
Old 09-16-2014
Quote:
Originally Posted by Yoda
An awk approach:-
Code:
awk '
        {
                A[$1] += $2
                T[$1] += ( $2 == 0 ? 0 : 1 )
        }
        END {
                for ( k in A )
                        print k, ( T[k] ? A[k] / T[k] : 0 )
        }
' file

Use sort if you want to sort the results by first column:-
Code:
' file | sort

Thank you all, I tried like this

Code:
awk '{a[$1]+=$2; cnt[$1]+=($2==0?0:1)}END{for (i in a) print i,(a[i]?a[i]/cnt[i]:0)}' 1

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add new column based in condition

At begining of column 2 the same block (2000) have 3 lines, in the next block (2336) it have 9 lines and for block (2524) 3 lines. ... (0 Replies)
Discussion started by: jiam912
0 Replies

2. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

3. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ... (2 Replies)
Discussion started by: jackken007
2 Replies

4. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

5. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

6. Shell Programming and Scripting

to add special tag to a column based on column condition

Hi All, I have following html code <TR><TD>9</TD><TD>AR_TVR_TBS </TD><TD>85000</TD><TD>39938</TD><TD>54212</TD><TD>46</TD></TR> <TR><TD>10</TD><TD>ASCV_SMY_TBS </TD><TD>69880</TD><TD>33316</TD><TD>45698</TD><TD>47</TD></TR> <TR><TD>11</TD><TD>ARC_TBS ... (9 Replies)
Discussion started by: ckwan
9 Replies

7. Shell Programming and Scripting

AWK: how to get average based on certain column

Hi, I'm new to shell programming, can anyone help me on this? I want to do following operations - 1. Average salary for each country 2. Total salary for each city and data that looks like - salary country city 10000 zzz BN 25000 zzz BN 30000 zzz BN 10000 yyy ZN 15000 yyy ZN ... (3 Replies)
Discussion started by: shell123
3 Replies

8. Shell Programming and Scripting

awk to select rows based on condition on column

I have got a file like this 003ABC00281020091005000100042.810001 ... (8 Replies)
Discussion started by: Maruti
8 Replies

9. Shell Programming and Scripting

Filter the column and print the result based on condition

Hi all This is my output of the some SQL Query TABLESPACE_NAME FILE_NAME TOTALSPACE FREESPACE USEDSPACE Free ------------------------- ------------------------------------------------------- ---------- --------- ---------... (2 Replies)
Discussion started by: jhon
2 Replies
Login or Register to Ask a Question