|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Hey guys, I am currently learning different bioinformatics applications, but I do not have all that much of a computer science background. Anyway, I have been asked to perform the mean and standard deviation of coverage for different transcript ID numbers. This involves a huge file with about 30 million lines. Basically, whenever there is the same value in one column/field, I want to get the mean and standard deviation for the other column/field for the corresponding lines. My input and desired output are below, but just imagine there being thousands to millions of different transcript IDs. I also want the output to include all the other fields from the original line for each calculation. The other fields do not follow any special pattern. So far I have been using a lot of awk, so if you have an awk solution that would be great. Also if you could give me a formula to next calculate the number of standard deviations each coverage value is away from the mean and put it in a separate field that would be even better, but I think I can figure this part out on my own. Input Code:
Transcript ID Other field Other field Coverage 1 3 6 1 2 4 8 2 1 5 10 3 2 6 12 6 Output Code:
Transcript ID Other field Other field Coverage Mean Standard deviation 1 3 6 1 2 1 2 4 8 2 4 2 1 5 10 3 2 1 2 6 12 6 4 2 Last edited by Scrutinizer; 06-28-2012 at 03:05 AM.. |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Hope I got the maths right ![]() Code:
awk 'BEGIN {
ARGV[ARGC++] = ARGV[ARGC-1]
}
NR == FNR && FNR > 1 {
id[$1] += $4; cid[$1]++
idq[$1] += $4 * $4
next
}
FNR == 1 {
if (NR == FNR) next
print $0, "Mean", "Standard deviation"
next
}
{
$1 = $1
print $0, id[$1]/cid[$1], sqrt(idq[$1]/cid[$1] - (id[$1]/cid[$1])**2)
}' OFS='\t' infile |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| AWK script for standard deviation / root mean square deviation | chrisjorg | Shell Programming and Scripting | 12 | 01-18-2012 11:30 AM |
| Standard deviation in awk | gd9629 | Shell Programming and Scripting | 11 | 08-24-2011 12:40 PM |
| Changing one column of delimited file column to fixed width column | manneni prakash | Shell Programming and Scripting | 5 | 06-22-2009 05:27 AM |
| Calculating the Standard Deviation for a column | kylle345 | UNIX for Dummies Questions & Answers | 1 | 05-18-2009 04:58 PM |
| Mean and Standard deviation | lakshmikanth.pg | Shell Programming and Scripting | 4 | 04-27-2009 03:04 PM |
|
|