Sponsored Content
Top Forums Shell Programming and Scripting recoding data points using SED?? Post 302360805 by doobedoo on Saturday 10th of October 2009 12:58:15 PM
Old 10-10-2009
Hello again,
Again, I apologize for the confsion. I made a mistake in the first post, the letters should be recoded to -1, 0, 1. This is the tricky part. I need to recode the letters on a per column, alphabetical order basis. There are several different combinations that can occur within a column:
AA, AC, CC = -1, 0, 1
AA, AG, GG = -1, 0, 1
AA, AT, TT = -1, 0, 1
CC, CG, GG = -1, 0, 1
CC, CT, TT = -1, 0, 1
GG, GT, TT = -1, 0, 1

Therefore anything with a mixed data point (AC, AG, AT, CG, CT, GT) will ALWAYS = 0, AA will ALWAYS = -1, and TT will ALWAYS = 1. The problem come when recoding CC and GG. As you can see, in some rows CC will come first in the alphabet and will be recoded as -1 (When the combo is CC, CG, GG) . However, in some columns CC does not come first in the alphabet and will be coded as 1 (when the combo is AA, AC, CC). The same problem occurs with GG. IS there any solution to this issue? I hope I explained it better this time!!

Thank you so much for your patience!!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. Shell Programming and Scripting

Recoding data in a matrix from an existing file

Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information. I have made a matrix like this (file 1): A B C D 1 2 3 4 I have another file with data like this (file 2): 1 A 1 C 3 C 4 B... (1 Reply)
Discussion started by: hubleo
1 Replies

10. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
gd_alter_encoding(3)						      GETDATA						      gd_alter_encoding(3)

NAME
gd_alter_encoding -- modify the binary encoding of data in a dirfile SYNOPSIS
#include <getdata.h> int gd_alter_encoding(DIRFILE *dirfile, unsigned int encoding, int fragment_index, int recode); DESCRIPTION
The gd_alter_encoding() function sets the binary encoding of the format specification fragment given by fragment_index to byte_sex in the dirfile(5) database specified by dirfile. The binary encoding of a fragment indicate the encoding of data stored in binary files associat- ed with RAW fields defined in the specified fragment. The binary encoding of a fragment containing no RAW fields is ignored. The byte_sex argument should be one of the following: GD_UNENCODED, GD_BZIP2_ENCODED, GD_GZIP_ENCODED, GD_LZMA_ENCODED, GD_SLIM_ENCODED, GD_TEXT_ENCODED. See gd_cbopen(3) and dirfile-encoding(5) for the meanings of these symbols and details on the supported encoding schemes. In addition to being simply a valid fragment index, fragment_index may also be the special value GD_ALL_FRAGMENTS, which indicates that the encoding of all fragments in the database should be changed. If the recode argument is non-zero, this call will recode the binary data of affected RAW fields to account for the change in binary encod- ing. If the encoding of the fragment is encoding insensitive, or if the data type is only one byte in size, no change is made. If recode is zero, affected binary files are left untouched. RETURN VALUE
Upon successful completion, gd_alter_encoding() returns zero. On error, it returns -1 and sets the dirfile error to a non-zero error val- ue. Possible error values are: GD_E_ACCMODE The specified dirfile was opened read-only. GD_E_ALLOC The library was unable to allocate memory. GD_E_BAD_DIRFILE The supplied dirfile was invalid. GD_E_BAD_INDEX The supplied index was out of range. GD_E_PROTECTED The metadata of the given format specification fragment was protected from change, or the binary data of the fragment was protected from change and binary file recoding was requested. GD_E_RAW_IO An I/O error occurred while attempting to recode a binary file. GD_E_UNCLEAN_DB An error occurred while moving the recoded file into place. As a result, the database may be in an unclean state. See the NOTES section below for recovery instructions. In this case, the dirfile will be flagged as invalid, to prevent further database corrup- tion. It should be immediately closed. GD_E_UNKNOWN_ENCODING The encoding scheme of the fragment is unknown. GD_E_UNSUPPORTED The encoding scheme of the fragment does not support binary file recoding. The dirfile error may be retrieved by calling gd_error(3). A descriptive error string for the last error encountered can be obtained from a call to gd_error_string(3). NOTES
A binary file recoding occurs out-of-place. As a result, sufficient space must be present on the filesystem for the binary files of all RAW fields in the fragment both before and after translation. If all fragments are updated by specifying GD_ALL_FRAGMENTS, the recoding occurs one fragment at a time. An error code of GD_E_UNCLEAN_DB indicates a system error occurred while moving the re-encoded binary data into place or when deleting the old data. If this happens, the database may be left in an unclean state. The caller should check the filesystem directly to ascertain the state of the dirfile data before continuing. For recovery instructions, see the file /usr/share/doc/getdata/unclean_database_recovery.txt. SEE ALSO
gd_cbopen(3), gd_error(3), gd_error_string(3), gd_encoding(3), dirfile(5), dirfile-format(5) Version 0.7.0 20 July 2010 gd_alter_encoding(3)
All times are GMT -4. The time now is 12:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy