awk to average target and gene


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to average target and gene
# 1  
Old 07-28-2015
awk to average target and gene

I am trying to modify the awk below to include the gene name ($5) for each target and can not seem to do so. Also, I'm not sure the calculation is right (average of all targets that are the same is $4 using the values in $7)? Thank you Smilie.

Code:
awk '{if((NR>1)&&($4!=last)){printf("%s\t%f\t%s\n", last,  total/len,pg);total=$7;len=1;}else{total+=$7;len+=1};pg=$5;last=$4;}END{printf("%s\t%f\t%s\n",  last, total/len,pg)}'

output.bam.hist.txt
Code:
chr1    40539722    40539865    chr1:40539722-40539865    PPT1    1    159
chr1    40539722    40539865    chr1:40539722-40539865    PPT1    2    161
chr1    40539722    40539865    chr1:40539722-40539865    PPT1    3    161

epilepsy70_average.txt
Code:
chr1:40539722-40539865    72.000000
chr1:40542503-40542595    46.500000
chr1:40544221-40544340    60.000000

Desired epilepsy70_average.txt
Code:
chr1:40539722-40539865    72.000000      PPT1
chr1:40542503-40542595    46.500000      PPT1
chr1:40544221-40544340    60.000000      PPT1

EDIT: I have modified the awk to calculate average using $7 and include $5 in the output below.

epilepsy70_average.txt
Code:
chr1:40539722-40539865    227.776224    PPT1
chr1:40542503-40542595    109.706522    PPT1
chr1:40544221-40544340    61.596639    PPT1

I can not figure out if the calculated value in $2 is less than or equal to 100 then that line the font changed to red, then the entire file is sort in ascending order by $2 Is this possible?

I think the below will print and maybe thats a start:

Code:
 awk '{if((NR>1)&&($4!=last)){printf("%s\t%f\t%s\n", last,  total/len,pg);total=$7;len=1;}else{total+=$7;len+=1};pg=$5;last=$4;}END{printf("%s\t%f\t%s\n",  last, total/len,pg)}' | awk '{if($2>=1000.00)print;}' epilepsy70_average.txt > sort.txt


Last edited by cmccabe; 07-29-2015 at 11:03 AM.. Reason: edited code to calulate from $7 and include $5
# 2  
Old 07-31-2015
I find multi-line scripts better readable.
A delayed printing (that happens by condition in the loop and in the END section) is better to be defined once in a function.
Code:
#!/bin/sh
awk '
function prt(){ printf "%s\t%f\t%s\n", last, total/len, pg }
{
  if (NR>1 && $4!=last) {
    prt()
    total=0; len=0
  }
  total+=$7; len+=1
  pg=$5
  last=$4
}
END { prt() }
' "$@" |

awk '{ if ($2>=100) print }' |

sort -k2,2n

Note the -k2,2n to sort on field 2.
The "$@" passes arguments, so you can run the script with
Code:
./scriptname file

Of course you can also do
Code:
./scriptname < file

Now if you want to optimize the two awk scripts into one, you can change the prt function:
Code:
#!/bin/sh
awk '
function prt(){
  avg=total/len
  if (avg>=100) {
    printf "%s\t%f\t%s\n", last, avg, pg
  }
}
{
  if (NR>1 && $4!=last) {
    prt()
    total=0; len=0
  }
  total+=$7; len+=1
  pg=$5
  last=$4
}
END { prt() }
' "$@" |

sort -k2,2n

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Moving Average

Hi, I'm using awk to try and get a moving average for the second column of numbers ($2) in the below example broken out by unique identifier in column 1 ($1) : H1,1.2 H1,2.3 H1,5.5 H1,6.6 H1,8.7 H1,4.1 H1,6.4 H1,7.8 H1,9.6 H1,3.2 H5,50.1 H5,54.2 H5,58.8 H5,60.9 H5,54.3 H5,52.7... (8 Replies)
Discussion started by: theflamingmoe
8 Replies

2. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

3. Shell Programming and Scripting

awk to output id, location, and average

Trying to use awk output the target in $1 with the region it maps to in $2 along with its average. The below is close but I just can not seem to add the region it maps to or get the count of lines not of the text. Thank you :). Basically, $1 occurs 5 times and maps to $2 with an average... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk or Bash: Cumulative average

For the data I would like to parse down and for each parsing I want a cumulative averaging, stored in an array that can be output. I.e. 546/NR = 546 (546+344)/NR=(546+344)/2 = etc. For N record input I want N values of the average (a block averaging effectively) Any... (3 Replies)
Discussion started by: chrisjorg
3 Replies

5. HP-UX

After adding new iscsi target port, still the session state of that target port is showing offline

Hi, I wanted to configure new iscsi port on HPUX system, i added the target port address and configured it, once done, went to array side and searched for that host iqn number , but was nt able to find the same, came to host, then when i ran "iscsiutil -pVS" command it gave me below result ... (0 Replies)
Discussion started by: Vinay Kumar D
0 Replies

6. Shell Programming and Scripting

Calculating average with awk

I need to find the average from a file like: data => BW:123 M:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1... (4 Replies)
Discussion started by: Slagle
4 Replies

7. Shell Programming and Scripting

Calculate Average AWK

I want to calculate the average line by line of some files with several lines on them, the files are identical, just want to average the 3rd columns of those files.:wall: Example file: File 1 001 0.046 0.667267 001 0.047 0.672028 001 0.048 0.656025 001 0.049 ... (2 Replies)
Discussion started by: AriasFco
2 Replies

8. Shell Programming and Scripting

Partial average of a column with awk

Hello, Let's assume I have 100 files FILE_${m} (0<m<101). Each of them contains 100 lines and 10 columns. I'd like to get in a file called "result" the average value of column 3, ONLY between lines 11 and 17, in order to plot that average as a function of the parameter m. So far I can compute... (6 Replies)
Discussion started by: DMini
6 Replies

9. Shell Programming and Scripting

Average in awk

Hi I am looking for an awk script which can compute average of all the fields every 5th line. The file looks: A B C D E F G H I J K L M 1 18 13 14 12 14 13 11 12 12 15 15 15 2 17 17 13 13 13 12 12 11 12 14 15 14 3 16 16 12 12 12 11 11 12 11 16 14 13 4 15 15 11 11 11 12 11 12 11... (6 Replies)
Discussion started by: saint2006
6 Replies

10. Shell Programming and Scripting

how to average in awk

Hi, I have the data like this $1 $2 1 12 2 13 3 14 4 12 5 12 6 12 7 13 8 14 9 12 10 12 i want to compute average of $1 and $2 every 5th line (1-5 and 6-10) Please help me with awk Thank you (4 Replies)
Discussion started by: saint2006
4 Replies
Login or Register to Ask a Question