[SOLVED] Converting data from one format to the other
Hi All,
I need to convert an exel spreadsheet into a SAS dataset,
and the following format change is needed. Please help, this is too complex
for a biologist.
Let me describe the input.
1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5 generations in my actual data,
I have shown 3 here, namely G1,G2 and PAR.
2nd row is family name,1st column is keyword 'gene', from 2nd column format is NameParent1-NameParent2-RandomNumber_Replicate.
If 'Parent' is present in the name, then format is NameParent-Parent-RandomNumber_Replicate and it should be included in the output.
The characters upto the second '-' is the family name, replicate (can be 1,2 or 3) is the number after '_'.
Random number (between second '-' and '_') can be ignored.
Starting 3rd row are gene names(column 1) and values(column 2 through 9).
There are 30000 genes in my actual data set.
The gene values need to repeat according to nested format shown below,
Only if the generation is PAR (stands for Parent), replace the generation 'PAR' by name of the parent
specified by the characters upto first '-'. For example if P1-Parent-9_1 is the column name and the
corresponding generation is PAR, the output should have 'P1' instead of 'PAR' in the generations column.
Sample input
Code:
generation G1 G1 G1 G2 G2 G2 PAR PAR PAR PAR PAR PAR
gene P1-P2-24_3 P1-H-56_2 P1-P2-84_1 P1-P2-34_3 P1-P2-33_1 P1-P2-99_2 P1-Parent-9_1 P1-Parent-43_2 P1-Parent-45_3 P2-Parent-62_1 P2-Parent-43_2 P2-Parent-11_3
gene1 0 0 0 0 0 0 0 0 0 3 3 7
gene2 2 1 1 2 6 4 6 7 8 67 6 66
Hi,
Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent.
E.g : username empid
------------------------
sri 123
to
username,empid
sri,123
Thanks,
Hema:confused: (2 Replies)
Hi,
I am having couple of files which i used to copy from windows to Linux, so now in case of text files (CTRL^M) appears at end of line. I know i can convert this windows format file to unix format file by running dos2unix.
My requirement here is that i want to do it automatically using a... (5 Replies)
Hi All,
I am new to this forum. Could anyone help me to resolve the following issue.
Input of the flat file contains several lines of text for example find below:
5022090,2,4,7154,88,,,,,4/1/2011 0:00,Z,L,2
5022090,3,1,6648,88,,,,,4/1/2011 0:00,Z,,1
5022090,4,1,6648,88,,,,,4/1/2011... (6 Replies)
Hi All,
I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for.
My Input data in inputfile.txt
19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
I have a file that contains 2 columns tag,pos
cat input_file
tag pos
atg 10
ata 16
agt 15
agg 19
atg 17
agg 14
I have used following command to sort the file based on second column
sort -k 2 input_file
tag pos
atg 10
agg 14
agt 15
ata 16
agg 19
atg 17 (2 Replies)
Dear Friends,
I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls.
The input is as follows:
>gi|1234|ref|
Query = 1 - 65, Target = 1677 - 1733
Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46
fd sdfsdfsdfsdf
fsdfdsfdfdfdfdfdf... (6 Replies)
Hi All,
I need help in converting the mentioned file format into desired output format using awk. Could anyone help me in this?
Below is the input..
Date Account Campaign AdGroup Keyword Conversion Revenue Var1 Var2 Var3 Var4 Var5 10 20 30 ... (8 Replies)
Here is the code that I am working with. I have tried several other things. any suggestions?
Lbl_Cost_Output.Text = (dDistance * dCostPerMile).ToString("C")
The label is formatted correctly in terms of value 0.00 but no dollar sign appears. Please let me know if you have any questions. (1 Reply)
Hi there,
How can i shorten this:
grep -ri "Password must meet complexity requirements" "$line" | sed 's/\t/<\/td><td>/' | sed 's/^.*:/<tr><td>/'| sed 's/$/<\/td><\/tr>/'
I am looking for a shorter alternative of sed. What I was trying to do is to change the string output format from
... (3 Replies)
Hi,
Can someone help in converting the below unstructured data to a CSV format please.
{
"branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG",
"branchName" : "NEWYORK-SSDF",
"branchProductId" : "72Y5HFHSF7H3RUNAWEF",
"PreferenceId" : "BASDBVcbzcYHcb",
"emailId" :... (9 Replies)
Discussion started by: naveen.kuppili
9 Replies
LEARN ABOUT OSF1
telecode
telecode(5) File Formats Manual telecode(5)NAME
telecode - A character encoding system (codeset) for Traditional Chinese
DESCRIPTION
The Telecode codeset (called Mitac Telex in early versions of the operating system) consists of 2 character planes. Each character plane
has 8836 character positions. In plane 1, standard characters occupy positions 0001 to 8045; the remaining 791 positions are for user-
defined characters. In plane 2, standard characters occupy positions 0001 to 8489; the remaining 346 positions are for user-defined charac-
ters. Telecode uses 2-byte values to represent characters on both planes.
Plane 1 Character Encoding
To differentiate plane 1 code from plane 2 code, the most significant bit (MSB) is set on in both bytes of a plane 1 character code. The
following formula calculates the value of a plane 1 character from its position on the plane:
1st byte = M + 161
2nd byte = N + 161 - M x 94
In this formula, N is the position of the character and M = N / 94.
For example, if a character is at position 2502 on plane 1, its encoding value is BBDB, which is calculated as follows:
N = 2502, M = 2502/94 = 26 1st byte = 26 + 161 = 187 2nd byte = 2502 + 161 - 26 x 94 = 219
Plane 2 Character Encoding
To differentiate plane 2 code from plane 1 code, the MSB of the first byte is set on and that of the second byte is set off for each plane
2 character code. The following formula calculates the value of a plane 2 character from its position:
1st byte = M + 161
2nd byte = N + 33 - M x 94
In this formula, N is the position of the character on the plane and M = N / 94.
For example, if a character is at position 2502 on plane 2, its encoding value is BB5B, which is calculated as follows:
N = 2502, M = 2502/94 = 26 1st byte = 26 + 161 = 187 2nd byte = 2502 + 33 - 26 x 94 = 91
Codeset Conversion
The following codeset converter pairs are available for converting Traditional Chinese characters between telecode and other encoding for-
mats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which telecode
is the input or output, see the reference page specified in the list item. big5_telecode, telecode_big5
Converting from and to the Big-5 codeset: big5(5).
Note that Big-5 encoding is equivalent to the Microsoft code-page format used on PCs for Traditional Chinese. You can therefore use
these converters to convert Traditional Chinese characters between PC code page format and Telecode encoding format. For more infor-
mation on how the operating system supports PC code pages, see code_page(5). dechanyu_telecode, telecode_dechanyu
Converting from and to the DEC Hanyu codeset: dechanyu(5). eucTW_telecode, telecode_eucTW
Converting from and to Taiwanese Extended UNIX Code: eucTW(5).
Font Support for Telecode
The operating system supports Telecode only through conversion to another codeset.
SEE ALSO
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanyu(5), dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5),
iconv_intro(5), l10n_intro(5), sbig5(5)telecode(5)