10-10-2009
Hello again,
Again, I apologize for the confsion. I made a mistake in the first post, the letters should be recoded to -1, 0, 1. This is the tricky part. I need to recode the letters on a per column, alphabetical order basis. There are several different combinations that can occur within a column:
AA, AC, CC = -1, 0, 1
AA, AG, GG = -1, 0, 1
AA, AT, TT = -1, 0, 1
CC, CG, GG = -1, 0, 1
CC, CT, TT = -1, 0, 1
GG, GT, TT = -1, 0, 1
Therefore anything with a mixed data point (AC, AG, AT, CG, CT, GT) will ALWAYS = 0, AA will ALWAYS = -1, and TT will ALWAYS = 1. The problem come when recoding CC and GG. As you can see, in some rows CC will come first in the alphabet and will be recoded as -1 (When the combo is CC, CG, GG) . However, in some columns CC does not come first in the alphabet and will be coded as 1 (when the combo is AA, AC, CC). The same problem occurs with GG. IS there any solution to this issue? I hope I explained it better this time!!
Thank you so much for your patience!!
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
suppose u have a file which consist of many data points separated by asterisk
Question is to extract third part in each line .
0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020
0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030
0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies
2. Shell Programming and Scripting
I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this:
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 CC -1 CC CC
838469. -1 -1 1 GG CC 0 CC 1
83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies
3. Shell Programming and Scripting
Hi All I have a data set like this tab delimited:
weft fgr-1 345 -1 fgrythdgd
weft fgr-3 456 -2 ghjdklflllff
weft fgr-11 456 -3 ghtjuffl
weft fgr-1 213 -2 ghtyjdkl
weft fgr-34 567 -5 fghytkflf
frgt fgr-36 567 -1 ghrjufjf
frgt fgr-45 678 -2 ghjruir
frgt fgr-34 546 -5 gjjjgkldlld
frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies
4. UNIX for Dummies Questions & Answers
hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies
5. Programming
Hi,
I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies
6. Shell Programming and Scripting
Hi,
I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e
Var = Ni -N(i-1)
0.3141
-3.6595
0.9171
5.2001
3.5331
3.7022
-6.1087
-5.1039
-9.8144
1.6516
-2.725
3.982
7.769
8.88 (5 Replies)
Discussion started by: malandisa
5 Replies
7. UNIX for Dummies Questions & Answers
Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).
manually to... (4 Replies)
Discussion started by: ida1215
4 Replies
8. Shell Programming and Scripting
I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies
9. Shell Programming and Scripting
Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information.
I have made a matrix like this (file 1):
A B C D
1
2
3
4
I have another file with data like this (file 2):
1 A
1 C
3 C
4 B... (1 Reply)
Discussion started by: hubleo
1 Replies
10. Shell Programming and Scripting
I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3.
For instance:
Input File 1
AAA BBB 10
CCC DDD 16
EEE FFF 20
Input File 2
... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
RECODE(1) FSF RECODE(1)
NAME
recode - converts files between character sets
SYNOPSIS
recode [OPTION]... [ [CHARSET] | REQUEST [FILE]... ]
DESCRIPTION
Free `recode' converts files between various character sets and surfaces.
If a long option shows an argument as mandatory, then it is mandatory for the equivalent short option also. Similarly for optional argu-
ments.
Listings:
-l, --list[=FORMAT]
list one or all known charsets and aliases
-k, --known=PAIRS
restrict charsets according to known PAIRS list
-h, --header[=[LN/]NAME]
write table NAME on stdout using LN, then exit
-F, --freeze-tables
write out a C module holding all tables
-T, --find-subsets
report all charsets being subset of others
-C, --copyright
display Copyright and copying conditions
--help display this help and exit
--version
output version information and exit
Operation modes:
-v, --verbose
explain sequence of steps and report progress
-q, --quiet, --silent
inhibit messages about irreversible recodings
-f, --force
force recodings even when not reversible
-t, --touch
touch the recoded files after replacement
-i, --sequence=files
use intermediate files for sequencing passes
--sequence=memory
use memory buffers for sequencing passes
-p, --sequence=pipe
use pipe machinery for sequencing passes
Fine tuning:
-s, --strict
use strict mappings, even loose characters
-d, --diacritics
convert only diacritics or alike for HTML/LaTeX
-S, --source[=LN]
limit recoding to strings and comments as for LN
-c, --colons
use colons instead of double quotes for diaeresis
-g, --graphics
approximate IBMPC rulers by ASCII graphics
-x, --ignore=CHARSET
ignore CHARSET while choosing a recoding path
Option -l with no FORMAT nor CHARSET list available charsets and surfaces. FORMAT is `decimal', `octal', `hexadecimal' or `full' (or one
of `dohf'). Unless DEFAULT_CHARSET is set in environment, CHARSET defaults to the locale dependent encoding, determined by LC_ALL,
LC_CTYPE, LANG. With -k, possible before charsets are listed for the given after CHARSET, both being tabular charsets, with PAIRS of the
form `BEF1:AFT1,BEF2:AFT2,...' and BEFs and AFTs being codes are given as decimal numbers. LN is some language, it may be `c', `perl' or
`po'; `c' is the default.
REQUEST is SUBREQUEST[,SUBREQUEST]...; SUBREQUEST is ENCODING[..ENCODING]... ENCODING is [CHARSET][/[SURFACE]]...; REQUEST often looks
like BEFORE..AFTER, with BEFORE and AFTER being charsets. An omitted CHARSET implies the usual charset; an omitted [/SURFACE]... means the
implied surfaces for CHARSET; a / with an empty surface name means no surfaces at all. See the manual.
If none of -i and -p are given, presume -p if no FILE, else -i. Each FILE is recoded over itself, destroying the original. If no FILE is
specified, then act as a filter and recode stdin to stdout.
AUTHOR
Written by Franc,ois Pinard <pinard@iro.umontreal.ca>.
REPORTING BUGS
Report bugs to <recode-bugs@iro.umontreal.ca>.
COPYRIGHT
Copyright (C) 1990, 92, 93, 94, 96, 97, 99 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
LAR PURPOSE.
SEE ALSO
The full documentation for recode is maintained as a Texinfo manual. If the info and recode programs are properly installed at your site,
the command
info recode
should give you access to the complete manual.
Free recode 3.6 June 2012 RECODE(1)