Sponsored Content
Top Forums Shell Programming and Scripting recoding data points using SED?? Post 302360580 by doobedoo on Friday 9th of October 2009 11:00:12 AM
Old 10-09-2009
recoding data points using SED??

Hello all,
I have a data file that needs some serious work...I have no idea how to implement the changes that are needed!

The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this:

Code:
ID       1     2     3     4    ........  64,000
AX65   AA   CT   TT   CC  ........    AT
DF00   AG   CC   AT   CG  ........    AA
HJ34   00    TT   TT   GG  ........    AA
KL98   AA    CC   AA   CG .........    00
SE00   GG    CT   00   GG .........    TT

The whole idea is to get each marker (column) recoded as either -10, 0 or 10 with the missing values (00) recoded as the average of each column. This will need to be accomplished in several steps.

*First, I need to recode the missing values that are currently coded as "00" to something else such as a "." HOWEVER I do not want anything in the ID column (first column) to be recoded.
*Second, I need to recode each column as -10, 0, or 10 depending on the alphabetical order. For example, in columns that contain AA, AG, and GG these will be recoded as -10, 0, and 10, respectively. Likewise, columns that contain CC, CG, and GG will be -10, 0, and 10 respectively.
**** There are several combinations of genotypes:
Code:
                AA, AC, CC
                AA, AG, GG
                AA, AT, TT
                CC, CG, GG
                CC, CT, TT
                GG, GT, TT

*Finally, I need to calculate the average of each marker (each column) and replace the missing values "." with this average value which will be different for every column

I am so sorry to have such a long grocery list of changes to implement, but like I said I have no idea how to do any of this...any help you can provide with any of these steps would be greatly appreciated!!
Thank you in advance,
Doob
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. Shell Programming and Scripting

Recoding data in a matrix from an existing file

Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information. I have made a matrix like this (file 1): A B C D 1 2 3 4 I have another file with data like this (file 2): 1 A 1 C 3 C 4 B... (1 Reply)
Discussion started by: hubleo
1 Replies

10. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
DB2_FIELD_NAME(3)							 1							 DB2_FIELD_NAME(3)

db2_field_name - Returns the name of the column in the result set

SYNOPSIS
string db2_field_name (resource $stmt, mixed $column) DESCRIPTION
Returns the name of the specified column in the result set. PARAMETERS
o $stmt - Specifies a statement resource containing a result set. o $column - Specifies the column in the result set. This can either be an integer representing the 0-indexed position of the column, or a string containing the name of the column. RETURN VALUES
Returns a string containing the name of the specified column. If the specified column does not exist in the result set, db2_field_name(3) returns FALSE. SEE ALSO
db2_field_display_size(3), db2_field_num(3), db2_field_precision(3), db2_field_scale(3), db2_field_type(3), db2_field_width(3). PHP Documentation Group DB2_FIELD_NAME(3)
All times are GMT -4. The time now is 08:05 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy