Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Standard deviation of one column when another column has the same value Post 302663053 by subtlegrace92 on Wednesday 27th of June 2012 01:22:13 PM
Old 06-27-2012
Question Standard deviation of one column when another column has the same value

Hey guys, I am currently learning different bioinformatics applications, but I do not have all that much of a computer science background. Anyway, I have been asked to perform the mean and standard deviation of coverage for different transcript ID numbers. This involves a huge file with about 30 million lines. Basically, whenever there is the same value in one column/field, I want to get the mean and standard deviation for the other column/field for the corresponding lines. My input and desired output are below, but just imagine there being thousands to millions of different transcript IDs. I also want the output to include all the other fields from the original line for each calculation. The other fields do not follow any special pattern.

So far I have been using a lot of awk, so if you have an awk solution that would be great.

Also if you could give me a formula to next calculate the number of standard deviations each coverage value is away from the mean and put it in a separate field that would be even better, but I think I can figure this part out on my own.

Input
Code:
Transcript ID   Other field Other field Coverage         
1                        3               6             1
2                        4               8             2  
1                        5               10           3  
2                        6               12           6

Output
Code:
Transcript ID   Other field  Other field Coverage  Mean   Standard deviation
1                         3              6            1           2                  1
2                         4              8            2           4                  2
1                         5              10           3           2                  1 
2                        6               12           6           4                  2


Last edited by Scrutinizer; 06-28-2012 at 04:05 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script for finding standard deviation

I have a CSV file that looks like 0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0 10,11,7,0,4,12,2,3,7,0,11,3,12,4,0,5,5,4,5,0,8,6,12,0,9,3,3,0,2,7,8 19,11,7,0,4,14,16,10,8,2,13,7,15,6,0,76,6,4,10,0,18,10,17,1,11,3,3,0,9,9,8... (7 Replies)
Discussion started by: RJ17
7 Replies

2. Shell Programming and Scripting

Mean and Standard deviation

Hi all, I am new to shell scripting and wanna calculate the mean and standard deviation using shell programming. I have a file with letters that are repeating and their corresponding duration a 0.32 a 0.89 aa 0.34 aa 0.23 au 0.012 au 0.26... (4 Replies)
Discussion started by: lakshmikanth.pg
4 Replies

3. UNIX for Dummies Questions & Answers

Calculating the Standard Deviation for a column

Hi all, I want to calculate the standard deviation for a column (happens to be column 3). Does any know of simple awk script to do this? Thanks (1 Reply)
Discussion started by: kylle345
1 Replies

4. Shell Programming and Scripting

using awk to print average and standard deviation into a file

Hi I want to use awk to print avg and st deviation but it does not go into a file for column 1 only. I can do average and # of records but i cannot get st deviation. awk '{sum+=$1} END { print "Average = ",sum/NR}' thanks (1 Reply)
Discussion started by: phil_heath
1 Replies

5. Shell Programming and Scripting

Standard deviation in awk

Hi all, I need to find the standard deviation of each column of a dataset below for each hour. The data is given in 5 second intervals as shown below DATE TIME FRAC_DAYS_SINCE_JAN1 FRAC_HRS_SINCE_JAN1 EPOCH_TIME ... (11 Replies)
Discussion started by: gd9629
11 Replies

6. Shell Programming and Scripting

AWK script for standard deviation / root mean square deviation

I have a file with say 50 columns, each containing a whole lot of data. Each column contains data from a separate simulation, but each simulation is related to the data in the last (REFERENCE) column $50 I need to calculate the RMS deviation for each data line, i.e. column 1 relative to... (12 Replies)
Discussion started by: chrisjorg
12 Replies

7. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Hi All, I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns. ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

8. Shell Programming and Scripting

calculating row-wise standard deviation using awk

Hi, I have a file containing 100,000 rows-by-120 columns and I need to compute for the standard deviation for each row. Any idea on how to calculate row-wise standard deviation using awk? My sample data looks like this: input data: 23 35 12 25 16 17 18 19 29 12 12 26 15 14 15 23 12 12... (2 Replies)
Discussion started by: ida1215
2 Replies

9. Shell Programming and Scripting

Output mean and standard deviation of a row

I have a file that looks that this: 820 890 530 1650 1600 1800 1850 1900 2270 1640 2300 1670 2080 2200 2350 1150 1630 2210 I would like to output the mean and standard deviation of each row so that my final output would look like this 820 890 530 746.667 155.849 1650 1600 1800... (5 Replies)
Discussion started by: kayak
5 Replies

10. Shell Programming and Scripting

SMA (Single Moving Average) and Standard Deviation

Hello Team, I am using the following awk script to calculate the SMA (Single Moving Average) for an specific period but now I would like to include the standard deviation output. Could you please help me to modify this awk shell script awk -F, -v points=5 ' { a = $2; ... (4 Replies)
Discussion started by: csierra
4 Replies
Font::TTF::Coverage(3pm)				User Contributed Perl Documentation				  Font::TTF::Coverage(3pm)

NAME
Font::TTF::Coverage - Opentype coverage and class definition objects DESCRIPTION
Coverage tables and class definition objects are virtually identical concepts in OpenType. Their difference comes purely in their storage. Therefore we can say that a coverage table is a class definition in which the class definition for each glyph is the corresponding index in the coverage table. The resulting data structure is that a Coverage table has the following fields: cover A boolean to indicate whether this table is a coverage table (TRUE) or a class definition (FALSE) val A hash of glyph ids against values (either coverage index or class value) fmt The storage format used is given here, but is recalculated when the table is written out. count A count of the elements in a coverage table for use with add. Each subsequent addition is added with the current count and increments the count. METHODS
new($isCover [, vals]) Creates a new coverage table or class definition table, depending upon the value of $isCover. if $isCover then vals may be a list of glyphs to include in order. If no $isCover, then vals is a hash of glyphs against class values. read($fh) Reads the coverage/class table from the given file handle out($fh, $state) Writes the coverage/class table to the given file handle. If $state is 1 then the output string is returned rather than being output to a filehandle. $c->add($glyphid[, $class]) Adds a glyph id to the coverage table incrementing the count so that each subsequent addition has the next sequential number. Returns the index number of the glyphid added $c->signature Returns a vector of all the glyph ids covered by this coverage table or class @map=$c->sort Sorts the coverage table so that indexes are in ascending order of glyphid. Returns a map such that $map[$new_index]=$old_index. $c->out_xml($context) Outputs this coverage/class in XML AUTHOR
Martin Hosken Martin_Hosken@sil.org. See Font::TTF::Font for copyright and licensing. perl v5.10.1 2011-02-25 Font::TTF::Coverage(3pm)
All times are GMT -4. The time now is 02:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy