Sponsored Content
Top Forums Shell Programming and Scripting recoding data points using SED?? Post 302360817 by danmero on Saturday 10th of October 2009 01:42:24 PM
Old 10-10-2009
Quote:
Originally Posted by doobedoo

ID 1 2 3 4 5 6 7 8
83845676 AG AC AT GT CC AA CC CC
83846900 AA AA TT GG CC AG CC TT
83847041 AA . AT GT . AG CG CT
83847004 AG AA TT TT CC AG CG CT
83847085 AG CC AT GT CG AG CG CT
83847118 . AA TT GG . GG CC CT
83847162 GG AA TT GT CG AG CG CT
83847165 AA AA . GG CC AG GG CT

Then I need to create an output file that has all of the letters recoded as -1, 0, or 1. This should be done in alphabetical order and on a per column basis so that:

ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 -1 -1 -1 -1
83846900 -1 -1 1 -1 -1 0 -1 1
83847041 -1 . 0 0 . 0 0 0
83847004 0 -1 1 1 -1 0 0 0
83847085 0 1 0 0 0 0 0 0
83847118 . -1 1 -1 . 1 -1 0
83847162 1 -1 1 0 0 0 0 0
83847165 -1 -1 . -1 -1 0 1 0
Your example is not consistent.


Code:
awk '{gsub(/00/,".");gsub(/A[CGT]|C[GT]|GT/,"0");gsub(/AA/,"-1");gsub(/TT/,"1")}1' file
ID 1 2 3 4 5 6 7 8
83845676 0 0 0 0 CC -1 CC CC
838469. -1 -1 1 GG CC 0 CC 1
83847041 -1 . 0 0 . 0 0 0
83847.4 0 -1 1 1 CC 0 0 0
83847085 0 CC 0 0 0 0 0 0
83847118 . -1 1 GG . GG CC 0
83847162 GG -1 1 0 0 0 0 0
83847165 -1 -1 . GG CC 0 GG 0

Now try to solve/elaborate on CC & GG problem.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. Shell Programming and Scripting

Recoding data in a matrix from an existing file

Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information. I have made a matrix like this (file 1): A B C D 1 2 3 4 I have another file with data like this (file 2): 1 A 1 C 3 C 4 B... (1 Reply)
Discussion started by: hubleo
1 Replies

10. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies
CLUBAK(1)						     ClusterShell User Manual							 CLUBAK(1)

NAME
clubak - format output from clush/pdsh-like output and more SYNOPSIS
clubak [ OPTIONS ] DESCRIPTION
clubak formats text from standard input containing lines of the form "node:output". It is fully backward compatible with dshbak(1) but provides additonal features. For instance, clubak always displays its results sorted by node/nodeset. You do not need to use clubak when using clush(1) as all output formatting features are already included in. It is provided for other usages, like post-processing results of the form "node:output". Like clush(1), clubak uses the ClusterShell.MsgTree module of the ClusterShell library (see pydoc ClusterShell.MsgTree). INVOCATION
clubak should be started with connected standard input. OPTIONS
--version show clubak version number and exit -b, -c gather nodes with same output (-c is provided for dshbak(1) compatibility) -d, --debug output more messages for debugging purpose -L disable header block and order output by nodes -r, --regroup fold nodeset using node groups -s GROUPSOURCE, --groupsource=GROUPSOURCE optional groups.conf(5) group source to use -G, --groupbase do not display group source prefix (always @groupname) -S SEPARATOR, --separator=SEPARATOR node / line content separator string (default: :) -F, --fast faster but memory hungry mode (preload all messages per node) -T, --tree message tree trace mode; switch to enable ClusterShell.MsgTree trace mode, all keys/nodes being kept for each message element of the tree, thus allowing special output gathering --color=WHENCOLOR whether to use ANSI colors to surround node or nodeset prefix/header with escape sequences to display them in color on the terminal. WHENCOLOR is never, always or auto (which use color if standard output refers to a terminal). Color is set to [34m (blue foreground text) and cannot be modified. --diff show diff between gathered outputs EXIT STATUS
An exit status of zero indicates success of the clubak command. EXAMPLES
1. clubak can be used to gather some recorded clush(1) results: Record clush(1) results in a file: # clush -w node[1-7] uname -r >/tmp/clush_output # clush -w node[32-159] uname -r >>/tmp/clush_output Display file gathered results (in line-mode): # clubak -bL </tmp/clush_output 2. Another example, iterate over node* text files in current directory and gather characters count for all of them: # find -name "node*" -exec wc -c {} ; | awk '{ gsub("./","",$2); print $2": "$1 }' | clubak -bL node[1,3]: 7 node2: 9 SEE ALSO
clush(1), nodeset(1), groups.conf(5). BUG REPORTS
Use the following URL to submit a bug report or feedback: https://github.com/cea-hpc/clustershell/issues AUTHOR
Stephane Thiell, CEA DAM <stephane.thiell@cea.fr> COPYRIGHT
CeCILL-C V1 1.6 2012-03-28 CLUBAK(1)
All times are GMT -4. The time now is 05:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy