Sponsored Content
Top Forums Shell Programming and Scripting recoding data points using SED?? Post 302360805 by doobedoo on Saturday 10th of October 2009 12:58:15 PM
Old 10-10-2009
Hello again,
Again, I apologize for the confsion. I made a mistake in the first post, the letters should be recoded to -1, 0, 1. This is the tricky part. I need to recode the letters on a per column, alphabetical order basis. There are several different combinations that can occur within a column:
AA, AC, CC = -1, 0, 1
AA, AG, GG = -1, 0, 1
AA, AT, TT = -1, 0, 1
CC, CG, GG = -1, 0, 1
CC, CT, TT = -1, 0, 1
GG, GT, TT = -1, 0, 1

Therefore anything with a mixed data point (AC, AG, AT, CG, CT, GT) will ALWAYS = 0, AA will ALWAYS = -1, and TT will ALWAYS = 1. The problem come when recoding CC and GG. As you can see, in some rows CC will come first in the alphabet and will be recoded as -1 (When the combo is CC, CG, GG) . However, in some columns CC does not come first in the alphabet and will be coded as 1 (when the combo is AA, AC, CC). The same problem occurs with GG. IS there any solution to this issue? I hope I explained it better this time!!

Thank you so much for your patience!!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. Shell Programming and Scripting

Recoding data in a matrix from an existing file

Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information. I have made a matrix like this (file 1): A B C D 1 2 3 4 I have another file with data like this (file 2): 1 A 1 C 3 C 4 B... (1 Reply)
Discussion started by: hubleo
1 Replies

10. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
RECODE(1)								FSF								 RECODE(1)

NAME
recode - converts files between character sets SYNOPSIS
recode [OPTION]... [ [CHARSET] | REQUEST [FILE]... ] DESCRIPTION
Free `recode' converts files between various character sets and surfaces. If a long option shows an argument as mandatory, then it is mandatory for the equivalent short option also. Similarly for optional argu- ments. Listings: -l, --list[=FORMAT] list one or all known charsets and aliases -k, --known=PAIRS restrict charsets according to known PAIRS list -h, --header[=[LN/]NAME] write table NAME on stdout using LN, then exit -F, --freeze-tables write out a C module holding all tables -T, --find-subsets report all charsets being subset of others -C, --copyright display Copyright and copying conditions --help display this help and exit --version output version information and exit Operation modes: -v, --verbose explain sequence of steps and report progress -q, --quiet, --silent inhibit messages about irreversible recodings -f, --force force recodings even when not reversible -t, --touch touch the recoded files after replacement -i, --sequence=files use intermediate files for sequencing passes --sequence=memory use memory buffers for sequencing passes -p, --sequence=pipe use pipe machinery for sequencing passes Fine tuning: -s, --strict use strict mappings, even loose characters -d, --diacritics convert only diacritics or alike for HTML/LaTeX -S, --source[=LN] limit recoding to strings and comments as for LN -c, --colons use colons instead of double quotes for diaeresis -g, --graphics approximate IBMPC rulers by ASCII graphics -x, --ignore=CHARSET ignore CHARSET while choosing a recoding path Option -l with no FORMAT nor CHARSET list available charsets and surfaces. FORMAT is `decimal', `octal', `hexadecimal' or `full' (or one of `dohf'). Unless DEFAULT_CHARSET is set in environment, CHARSET defaults to the locale dependent encoding, determined by LC_ALL, LC_CTYPE, LANG. With -k, possible before charsets are listed for the given after CHARSET, both being tabular charsets, with PAIRS of the form `BEF1:AFT1,BEF2:AFT2,...' and BEFs and AFTs being codes are given as decimal numbers. LN is some language, it may be `c', `perl' or `po'; `c' is the default. REQUEST is SUBREQUEST[,SUBREQUEST]...; SUBREQUEST is ENCODING[..ENCODING]... ENCODING is [CHARSET][/[SURFACE]]...; REQUEST often looks like BEFORE..AFTER, with BEFORE and AFTER being charsets. An omitted CHARSET implies the usual charset; an omitted [/SURFACE]... means the implied surfaces for CHARSET; a / with an empty surface name means no surfaces at all. See the manual. If none of -i and -p are given, presume -p if no FILE, else -i. Each FILE is recoded over itself, destroying the original. If no FILE is specified, then act as a filter and recode stdin to stdout. AUTHOR
Written by Franc,ois Pinard <pinard@iro.umontreal.ca>. REPORTING BUGS
Report bugs to <recode-bugs@iro.umontreal.ca>. COPYRIGHT
Copyright (C) 1990, 92, 93, 94, 96, 97, 99 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU- LAR PURPOSE. SEE ALSO
The full documentation for recode is maintained as a Texinfo manual. If the info and recode programs are properly installed at your site, the command info recode should give you access to the complete manual. Free recode 3.6 June 2012 RECODE(1)
All times are GMT -4. The time now is 09:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy