Sponsored Content
Top Forums Shell Programming and Scripting recoding data points using SED?? Post 302360680 by doobedoo on Friday 9th of October 2009 03:56:41 PM
Old 10-09-2009
I apologize for the confusion! I understand where I need to go with this but I have no clue how to tell the computer to do it so it is hard for me to explain it to others as well...let me try again...

My input file currently looks like this:

ID 1 2 3 4 5 6 7 8
83845676 AG AC AT GT CC AA CC CC
83846900 AA AA TT GG CC AG CC TT
83847041 AA 00 AT GT 00 AG CG CT
83847004 AG AA TT TT CC AG CG CT
83847085 AG CC AT GT CG AG CG CT
83847118 00 AA TT GG 00 GG CC CT
83847162 GG AA TT GT CG AG CG CT
83847165 AA AA 00 GG CC AG GG CT

I want to rename the missing values so they are just a period and save an output file like this:

ID 1 2 3 4 5 6 7 8
83845676 AG AC AT GT CC AA CC CC
83846900 AA AA TT GG CC AG CC TT
83847041 AA . AT GT . AG CG CT
83847004 AG AA TT TT CC AG CG CT
83847085 AG CC AT GT CG AG CG CT
83847118 . AA TT GG . GG CC CT
83847162 GG AA TT GT CG AG CG CT
83847165 AA AA . GG CC AG GG CT

Then I need to create an output file that has all of the letters recoded as -1, 0, or 1. This should be done in alphabetical order and on a per column basis so that:

ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 -1 -1 -1 -1
83846900 -1 -1 1 -1 -1 0 -1 1
83847041 -1 . 0 0 . 0 0 0
83847004 0 -1 1 1 -1 0 0 0
83847085 0 1 0 0 0 0 0 0
83847118 . -1 1 -1 . 1 -1 0
83847162 1 -1 1 0 0 0 0 0
83847165 -1 -1 . -1 -1 0 1 0

Finally I need to calculate the average of each column and replace the missing values from that column with the average:

ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 -1 -1 -1 -1
83846900 -1 -1 1 -1 -1 0 -1 1
83847041 -1 -0.5 0 0 -0.5 0 0 0
83847004 0 -1 1 1 -1 0 0 0
83847085 0 1 0 0 0 0 0 0
83847118 -0.25 -1 1 -1 -0.5 1 -1 0
83847162 1 -1 1 0 0 0 0 0
83847165 -1 -1 0.5 -1 -1 0 1 0

This will be the final file. Does this make more since or have I confused you more??

Thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. Shell Programming and Scripting

Recoding data in a matrix from an existing file

Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information. I have made a matrix like this (file 1): A B C D 1 2 3 4 I have another file with data like this (file 2): 1 A 1 C 3 C 4 B... (1 Reply)
Discussion started by: hubleo
1 Replies

10. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
Temporary(4)						     Kernel Interfaces Manual						      Temporary(4)

NAME
Temporary - Stores data files during transfers to remote systems SYNOPSIS
/usr/spool/uucp/SystemName/TM.xxPID.000 DESCRIPTION
The uucp Temporary (TM.*) files store data files during transfers to remote systems. After a Data (D.*) file is transferred to a remote system by the uucico daemon, the uucp program places it in a subdirectory of the uucp spooling directory named /usr/spool/uucp/SystemName, where the SystemName directory is named for the computer that is transmitting the file. The uucp program creates a temporary data file to hold the original data file. The full pathname of the temporary data file is in the following format: /usr/spool/uucp/SystemName/TM.xxPID.000 where the SystemName directory is named for the computer that is sending the file, and TM.xxPID.000 is the name of the file; for example, TM.00451.000. The PID variable is the process ID of the job. FILES
Describes accessible remote systems Contains uucp command, data, and execute files Contain data to be transferred. Contain files that uucp has transferred RELATED INFORMATION
Daemons: uucico(8) Commands: uucp(1), uudemon.cleanu(4), uupick(1), uuto(1), uux(1) delim off Temporary(4)
All times are GMT -4. The time now is 10:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy