10-23-2009
Writing an algorithm to recode data points
I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this:
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 CC -1 CC CC
838469. -1 -1 1 GG CC 0 CC 1
83847041 -1 . 0 0 . 0 0 0
83847.4 0 -1 1 1 CC 0 0 0
83847085 0 CC 0 0 0 0 0 0
83847118 . -1 1 GG . GG CC 0
83847162 GG -1 1 0 0 0 0 0
83847165 -1 -1 . GG CC 0 GG 0
The problem with the GG and CC is that in either case they can be a -1 or a 1, depending on what has already been recoded. If a GG is in a column that already contains 1's then GG must = -1. If the GG is in a column that already contains -1's, then the GG must be a 1. This is also true for the CC columns. I have a total of >64,000 columns so I can not go through and list which column is which. It has been suggested that I need to write an algorithm to do this but I am not very familiar with programming. Can anyone help me?
Thanks!
Last edited by Franklin52; 10-27-2009 at 04:42 AM..
Reason: IMG tag process.gif removed
10 More Discussions You Might Find Interesting
1. UNIX and Linux Applications
I have a simple gnuplot question. I have a set of points (list of x,y,z values; irregularly spaced, i.e. no grid) that I want to plot. I want the plot to look like this:
- points in map view (no 3D view)
- color of each point should depend on its z-value.
- I want to define my own color scale
-... (0 Replies)
Discussion started by: karman
0 Replies
2. Shell Programming and Scripting
suppose u have a file which consist of many data points separated by asterisk
Question is to extract third part in each line .
0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020
0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030
0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies
3. Shell Programming and Scripting
Hello all,
I have a data file that needs some serious work...I have no idea how to implement the changes that are needed!
The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this:
ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies
4. Shell Programming and Scripting
Hi All I have a data set like this tab delimited:
weft fgr-1 345 -1 fgrythdgd
weft fgr-3 456 -2 ghjdklflllff
weft fgr-11 456 -3 ghtjuffl
weft fgr-1 213 -2 ghtyjdkl
weft fgr-34 567 -5 fghytkflf
frgt fgr-36 567 -1 ghrjufjf
frgt fgr-45 678 -2 ghjruir
frgt fgr-34 546 -5 gjjjgkldlld
frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies
5. UNIX for Dummies Questions & Answers
hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies
6. Programming
Hi,
I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies
7. Shell Programming and Scripting
Hi,
I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e
Var = Ni -N(i-1)
0.3141
-3.6595
0.9171
5.2001
3.5331
3.7022
-6.1087
-5.1039
-9.8144
1.6516
-2.725
3.982
7.769
8.88 (5 Replies)
Discussion started by: malandisa
5 Replies
8. UNIX for Dummies Questions & Answers
Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).
manually to... (4 Replies)
Discussion started by: ida1215
4 Replies
9. Shell Programming and Scripting
I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies
10. Shell Programming and Scripting
I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3.
For instance:
Input File 1
AAA BBB 10
CCC DDD 16
EEE FFF 20
Input File 2
... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
LEARN ABOUT FREEBSD
column
COLUMN(1) BSD General Commands Manual COLUMN(1)
NAME
column -- columnate lists
SYNOPSIS
column [-tx] [-c columns] [-s sep] [file ...]
DESCRIPTION
The column utility formats its input into multiple columns. Rows are filled before columns. Input is taken from file operands, or, by
default, from the standard input. Empty lines are ignored.
The options are as follows:
-c Output is formatted for a display columns wide.
-s Specify a set of characters to be used to delimit columns for the -t option.
-t Determine the number of columns the input contains and create a table. Columns are delimited with whitespace, by default, or with
the characters supplied using the -s option. Useful for pretty-printing displays.
-x Fill columns before filling rows.
ENVIRONMENT
The COLUMNS, LANG, LC_ALL and LC_CTYPE environment variables affect the execution of column as described in environ(7).
EXIT STATUS
The column utility exits 0 on success, and >0 if an error occurs.
EXAMPLES
(printf "PERM LINKS OWNER GROUP SIZE MONTH DAY " ;
printf "HH:MM/YEAR NAME
" ;
ls -l | sed 1d) | column -t
SEE ALSO
colrm(1), ls(1), paste(1), sort(1)
HISTORY
The column command appeared in 4.3BSD-Reno.
BUGS
Input lines are limited to LINE_MAX (2048) bytes in length.
BSD
July 29, 2004 BSD