10-09-2009
I apologize for the confusion! I understand where I need to go with this but I have no clue how to tell the computer to do it so it is hard for me to explain it to others as well...let me try again...
My input file currently looks like this:
ID 1 2 3 4 5 6 7 8
83845676 AG AC AT GT CC AA CC CC
83846900 AA AA TT GG CC AG CC TT
83847041 AA 00 AT GT 00 AG CG CT
83847004 AG AA TT TT CC AG CG CT
83847085 AG CC AT GT CG AG CG CT
83847118 00 AA TT GG 00 GG CC CT
83847162 GG AA TT GT CG AG CG CT
83847165 AA AA 00 GG CC AG GG CT
I want to rename the missing values so they are just a period and save an output file like this:
ID 1 2 3 4 5 6 7 8
83845676 AG AC AT GT CC AA CC CC
83846900 AA AA TT GG CC AG CC TT
83847041 AA . AT GT . AG CG CT
83847004 AG AA TT TT CC AG CG CT
83847085 AG CC AT GT CG AG CG CT
83847118 . AA TT GG . GG CC CT
83847162 GG AA TT GT CG AG CG CT
83847165 AA AA . GG CC AG GG CT
Then I need to create an output file that has all of the letters recoded as -1, 0, or 1. This should be done in alphabetical order and on a per column basis so that:
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 -1 -1 -1 -1
83846900 -1 -1 1 -1 -1 0 -1 1
83847041 -1 . 0 0 . 0 0 0
83847004 0 -1 1 1 -1 0 0 0
83847085 0 1 0 0 0 0 0 0
83847118 . -1 1 -1 . 1 -1 0
83847162 1 -1 1 0 0 0 0 0
83847165 -1 -1 . -1 -1 0 1 0
Finally I need to calculate the average of each column and replace the missing values from that column with the average:
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 -1 -1 -1 -1
83846900 -1 -1 1 -1 -1 0 -1 1
83847041 -1 -0.5 0 0 -0.5 0 0 0
83847004 0 -1 1 1 -1 0 0 0
83847085 0 1 0 0 0 0 0 0
83847118 -0.25 -1 1 -1 -0.5 1 -1 0
83847162 1 -1 1 0 0 0 0 0
83847165 -1 -1 0.5 -1 -1 0 1 0
This will be the final file. Does this make more since or have I confused you more??
Thanks
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
suppose u have a file which consist of many data points separated by asterisk
Question is to extract third part in each line .
0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020
0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030
0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies
2. Shell Programming and Scripting
I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this:
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 CC -1 CC CC
838469. -1 -1 1 GG CC 0 CC 1
83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies
3. Shell Programming and Scripting
Hi All I have a data set like this tab delimited:
weft fgr-1 345 -1 fgrythdgd
weft fgr-3 456 -2 ghjdklflllff
weft fgr-11 456 -3 ghtjuffl
weft fgr-1 213 -2 ghtyjdkl
weft fgr-34 567 -5 fghytkflf
frgt fgr-36 567 -1 ghrjufjf
frgt fgr-45 678 -2 ghjruir
frgt fgr-34 546 -5 gjjjgkldlld
frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies
4. UNIX for Dummies Questions & Answers
hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies
5. Programming
Hi,
I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies
6. Shell Programming and Scripting
Hi,
I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e
Var = Ni -N(i-1)
0.3141
-3.6595
0.9171
5.2001
3.5331
3.7022
-6.1087
-5.1039
-9.8144
1.6516
-2.725
3.982
7.769
8.88 (5 Replies)
Discussion started by: malandisa
5 Replies
7. UNIX for Dummies Questions & Answers
Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).
manually to... (4 Replies)
Discussion started by: ida1215
4 Replies
8. Shell Programming and Scripting
I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies
9. Shell Programming and Scripting
Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information.
I have made a matrix like this (file 1):
A B C D
1
2
3
4
I have another file with data like this (file 2):
1 A
1 C
3 C
4 B... (1 Reply)
Discussion started by: hubleo
1 Replies
10. Shell Programming and Scripting
I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3.
For instance:
Input File 1
AAA BBB 10
CCC DDD 16
EEE FFF 20
Input File 2
... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
LEARN ABOUT OSF1
temporary
Temporary(4) Kernel Interfaces Manual Temporary(4)
NAME
Temporary - Stores data files during transfers to remote systems
SYNOPSIS
/usr/spool/uucp/SystemName/TM.xxPID.000
DESCRIPTION
The uucp Temporary (TM.*) files store data files during transfers to remote systems. After a Data (D.*) file is transferred to a remote
system by the uucico daemon, the uucp program places it in a subdirectory of the uucp spooling directory named /usr/spool/uucp/SystemName,
where the SystemName directory is named for the computer that is transmitting the file. The uucp program creates a temporary data file to
hold the original data file.
The full pathname of the temporary data file is in the following format: /usr/spool/uucp/SystemName/TM.xxPID.000
where the SystemName directory is named for the computer that is sending the file, and TM.xxPID.000 is the name of the file; for example,
TM.00451.000. The PID variable is the process ID of the job.
FILES
Describes accessible remote systems Contains uucp command, data, and execute files Contain data to be transferred. Contain files that uucp
has transferred
RELATED INFORMATION
Daemons: uucico(8)
Commands: uucp(1), uudemon.cleanu(4), uupick(1), uuto(1), uux(1) delim off
Temporary(4)