Visit Our UNIX and Linux User Community

Full Discussion: Get column average using ID
Top Forums Shell Programming and Scripting Get column average using ID Post 302852101 by jwbucha on Tuesday 10th of September 2013 01:45:19 PM
Old 09-10-2013
Not homework, but research problem. I am a PhD student in quantitative animal genomics. I am a geneticist, but working on learning unix programming. I have large output files (500 mb) that are analyzing effects from 50,000 DNA markers in 2000 animals. The program divides the genome into 1,000,000 base pair segments called a 'window'. I need to extract the breeding value (BV) for each window averaged across the 2000 animals. The table above is an example of what my output looks like. I apologize if I am not meeting the formatting requirements for this forum, it is my first post. I have been unable to find an awk solution anywhere else, hence my post.

edit: I should clarify that even though I am a student this is not for a class, but real data that is part of my research.
 
Test Your Knowledge in Computers #849
Difficulty: Medium
Mandatory protocols for all Bluetooth stacks are LMP, L2CAP and SOAP.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies

2. UNIX for Dummies Questions & Answers

Use awk to calculate average of column 3

Suppose I have 500 files in a directory and I need to Use awk to calculate average of column 3 for each of the file, how would I do that? (6 Replies)
Discussion started by: grossgermany
6 Replies

3. UNIX for Dummies Questions & Answers

average of a column in a table

Hello, Is there a quick way to compute the average of a column data in a numerical tab delimeted file? Thanks, Gussi (2 Replies)
Discussion started by: Gussifinknottle
2 Replies

4. UNIX for Dummies Questions & Answers

Average for repeated elements in a column

I have a file that looks like this 452 025_E3 8 025_E3 82 025_F5 135 025_F5 5 025_F5 23 025_G2 38 025_G2 71 025_G2 9 026_A12 81 026_A12 10 026_A12 some of the elements in column2 are repeated. I want an output file that will extract the... (1 Reply)
Discussion started by: FelipeAd
1 Replies

5. Shell Programming and Scripting

average each column in a file

Hi, I tried to do this in excel but there is a limit to how many rows it can handle. All I need to do is average each column in a file and get the final value. My file looks something like this (obviously a lot larger): Joe HHR + 1 2 3 4 5 6 7 8 Jor HHR - 1 2 3 4 5 6 7 8 the output... (1 Reply)
Discussion started by: kylle345
1 Replies

6. Shell Programming and Scripting

average of rows with same value in the first column

Dear All, I have this file tab delimited A 1 12 22 B 3 34 33 C 55 9 32 A 12 81 71 D 11 1 66 E 455 4 2 B 89 4 3 I would like to make the average every column where the first column is the same, for example, A 6,5 46,5 46,5 B 46,0 19,0 18,0 C 55,0 9,0 32,0 D 11,0 1,0 66,0... (8 Replies)
Discussion started by: paolo.kunder
8 Replies

7. Shell Programming and Scripting

Average of columns with values of other column with same name

I have a lot of input files that have the following form: Sample Cq Sample Cq Sample Cq Sample Cq Sample Cq 1WBIN 23.45 1WBIN 23.45 1CVSIN 23.96 1CVSIN 23.14 S1 31.37 1WBIN 23.53 1WBIN 23.53 1CVSIN 23.81 1CVSIN 23.24 S1 31.49 1WBIN 24.55 1WBIN 24.55 1CVSIN 23.86 1CVSIN 23.24 S1 31.74 ... (3 Replies)
Discussion started by: isildur1234
3 Replies

8. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ... (2 Replies)
Discussion started by: jackken007
2 Replies

9. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

10. Shell Programming and Scripting

Average each numeric column

Hi all, Does anyone know of an efficient unix script to average each numeric column of a multi-column tab delimited file (with header) with some character columns. Here is an example input file: CHR RS_ID ALLELE POP1 POP2 POP3 POP4 POP5 POP6 POP7 POP8... (7 Replies)
Discussion started by: Geneanalyst
7 Replies
funtbl(1)							SAORD Documentation							 funtbl(1)

NAME
funtbl - extract a table from Funtools ASCII output SYNOPSIS
funtable [-c cols] [-h] [-n table] [-p prog] [-s sep] <iname> DESCRIPTION
[NB: This program has been deprecated in favor of the ASCII text processing support in funtools. You can now perform fundisp on funtools ASCII output files (specifying the table using bracket notation) to extract tables and columns.] The funtbl script extracts a specified table (without the header and comments) from a funtools ASCII output file and writes the result to the standard output. The first non-switch argument is the ASCII input file name (i.e. the saved output from funcnts, fundisp, funhist, etc.). If no filename is specified, stdin is read. The -n switch specifies which table (starting from 1) to extract. The default is to extract the first table. The -c switch is a space-delimited list of column numbers to output, e.g. -c "1 3 5" will extract the first three odd-numbered columns. The default is to extract all columns. The -s switch specifies the separator string to put between columns. The default is a single space. The -h switch specifies that column names should be added in a header line before the data is output. With- out the switch, no header is prepended. The -p program switch allows you to specify an awk-like program to run instead of the default (which is host-specific and is determined at build time). The -T switch will output the data in rdb format (i.e., with a 2-row header of column names and dashes, and with data columns separated by tabs). The -help switch will print out a message describing program usage. For example, consider the output from the following funcnts command: [sh] funcnts -sr snr.ev "ann 512 512 0 9 n=3" # source # data file: /proj/rd/data/snr.ev # arcsec/pixel: 8 # background # constant value: 0.000000 # column units # area: arcsec**2 # surf_bri: cnts/arcsec**2 # surf_err: cnts/arcsec**2 # summed background-subtracted results upto net_counts error background berror area surf_bri surf_err ---- ------------ --------- ------------ --------- --------- --------- --------- 1 147.000 12.124 0.000 0.000 1600.00 0.092 0.008 2 625.000 25.000 0.000 0.000 6976.00 0.090 0.004 3 1442.000 37.974 0.000 0.000 15936.00 0.090 0.002 # background-subtracted results reg net_counts error background berror area surf_bri surf_err ---- ------------ --------- ------------ --------- --------- --------- --------- 1 147.000 12.124 0.000 0.000 1600.00 0.092 0.008 2 478.000 21.863 0.000 0.000 5376.00 0.089 0.004 3 817.000 28.583 0.000 0.000 8960.00 0.091 0.003 # the following source and background components were used: source_region(s) ---------------- ann 512 512 0 9 n=3 reg counts pixels sumcnts sumpix ---- ------------ --------- ------------ --------- 1 147.000 25 147.000 25 2 478.000 84 625.000 109 3 817.000 140 1442.000 249 There are four tables in this output. To extract the last one, you can execute: [sh] funcnts -s snr.ev "ann 512 512 0 9 n=3" | funtbl -n 4 1 147.000 25 147.000 25 2 478.000 84 625.000 109 3 817.000 140 1442.000 249 Note that the output has been re-formatted so that only a single space separates each column, with no extraneous header or comment informa- tion. To extract only columns 1,2, and 4 from the last example (but with a header prepended and tabs between columns), you can execute: [sh] funcnts -s snr.ev "ann 512 512 0 9 n=3" | funtbl -c "1 2 4" -h -n 4 -s " " #reg counts sumcnts 1 147.000 147.000 2 478.000 625.000 3 817.000 1442.000 Of course, if the output has previously been saved in a file named foo.out, the same result can be obtained by executing: [sh] funtbl -c "1 2 4" -h -n 4 -s " " foo.out #reg counts sumcnts 1 147.000 147.000 2 478.000 625.000 3 817.000 1442.000 SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funtbl(1)

Featured Tech Videos

All times are GMT -4. The time now is 10:04 AM.
Unix & Linux Forums Content Copyright 1993-2020. All Rights Reserved.
Privacy Policy