[SOLVED] Converting data from one format to the other Post: 302742723

Sponsored Content

Top Forums Shell Programming and Scripting [SOLVED] Converting data from one format to the other Post 302742723 by newbie83 on Tuesday 11th of December 2012 01:21:38 PM

12-11-2012

Registered User

[SOLVED] Converting data from one format to the other

Hi All,

I need to convert an exel spreadsheet into a SAS dataset,
and the following format change is needed. Please help, this is too complex
for a biologist.

Let me describe the input.

1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5 generations in my actual data,
I have shown 3 here, namely G1,G2 and PAR.

2nd row is family name,1st column is keyword 'gene', from 2nd column format is NameParent1-NameParent2-RandomNumber_Replicate.
If 'Parent' is present in the name, then format is NameParent-Parent-RandomNumber_Replicate and it should be included in the output.
The characters upto the second '-' is the family name, replicate (can be 1,2 or 3) is the number after '_'.
Random number (between second '-' and '_') can be ignored.

Starting 3rd row are gene names(column 1) and values(column 2 through 9).
There are 30000 genes in my actual data set.

The gene values need to repeat according to nested format shown below,
Only if the generation is PAR (stands for Parent), replace the generation 'PAR' by name of the parent
specified by the characters upto first '-'. For example if P1-Parent-9_1 is the column name and the
corresponding generation is PAR, the output should have 'P1' instead of 'PAR' in the generations column.

Sample input

Code:

generation G1    G1    G1    G2    G2    G2    PAR    PAR    PAR    PAR    PAR    PAR
gene    P1-P2-24_3    P1-H-56_2    P1-P2-84_1    P1-P2-34_3    P1-P2-33_1    P1-P2-99_2 P1-Parent-9_1 P1-Parent-43_2 P1-Parent-45_3 P2-Parent-62_1 P2-Parent-43_2 P2-Parent-11_3
gene1    0    0    0    0    0    0    0    0    0    3    3    7
gene2    2    1    1    2    6    4    6    7    8    67    6    66

Expected Output

Code:

family    rep    generation    gene     value        
P1-P2    3    G1        gene1    0
P1-P2    2    G1        gene1    0    
P1-P2    1    G1        gene1    0
P1-P2    3    G2        gene1    0
P1-P2    1    G2        gene1    0    
P1-P2    2    G2        gene1     0
P1-P2    1    P1        gene1    0
P1-P2    2    P1        gene1     0
P1-P2    3    P1        gene1    0
P1-P2    1    P2        gene1    3
P1-P2    2    P2        gene1    3
P1-P2    3    P2        gene1    7
P1-P2    3    G1        gene2    2
P1-P2    2    G1        gene2    1    
P1-P2    1    G1        gene2    1
P1-P2    3    G2        gene2    2
P1-P2    1    G2        gene2    6    
P1-P2    2    G2        gene2     4
P1-P2    1    P1        gene2     6
P1-P2    2    P1        gene2     7
P1-P2    3    P1        gene2    8
P1-P2    1    P2        gene2    67
P1-P2    2    P2        gene2    6
P1-P2    3    P2        gene2    66

Thanks

newbie83

View Public Profile for newbie83

Find all posts by newbie83

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused:

2. Shell Programming and Scripting

Converting windows format file to unix format using script

Hi, I am having couple of files which i used to copy from windows to Linux, so now in case of text files (CTRL^M) appears at end of line. I know i can convert this windows format file to unix format file by running dos2unix. My requirement here is that i want to do it automatically using a...

3. Shell Programming and Scripting

Converting the date format

Hi All, I am new to this forum. Could anyone help me to resolve the following issue. Input of the flat file contains several lines of text for example find below: 5022090,2,4,7154,88,,,,,4/1/2011 0:00,Z,L,2 5022090,3,1,6648,88,,,,,4/1/2011 0:00,Z,,1 5022090,4,1,6648,88,,,,,4/1/2011...

4. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51...

5. Shell Programming and Scripting

[Solved] Converting the data into matrix with 0's and 1's

I have a file that contains 2 columns tag,pos cat input_file tag pos atg 10 ata 16 agt 15 agg 19 atg 17 agg 14 I have used following command to sort the file based on second column sort -k 2 input_file tag pos atg 10 agg 14 agt 15 ata 16 agg 19 atg 17

6. Shell Programming and Scripting

Converting text files to xls through awk script for specific data format

Dear Friends, I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls. The input is as follows: >gi|1234|ref| Query = 1 - 65, Target = 1677 - 1733 Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46 fd sdfsdfsdfsdf fsdfdsfdfdfdfdfdf...

7. Shell Programming and Scripting

Need help in converting the file format

Hi All, I need help in converting the mentioned file format into desired output format using awk. Could anyone help me in this? Below is the input.. Date Account Campaign AdGroup Keyword Conversion Revenue Var1 Var2 Var3 Var4 Var5 10 20 30 ...

8. Programming

Visual Basic converting a decimal data type to a label with currency format

Here is the code that I am working with. I have tried several other things. any suggestions? Lbl_Cost_Output.Text = (dDistance * dCostPerMile).ToString("C") The label is formatted correctly in terms of value 0.00 but no dollar sign appears. Please let me know if you have any questions.

9. Shell Programming and Scripting

Converting another line to another format

Hi there, How can i shorten this: grep -ri "Password must meet complexity requirements" "$line" | sed 's/\t/<\/td><td>/' | sed 's/^.*:/<tr><td>/'| sed 's/$/<\/td><\/tr>/' I am looking for a shorter alternative of sed. What I was trying to do is to change the string output format from ...

10. UNIX for Dummies Questions & Answers

Converting unstructured data to structured data

Hi, Can someone help in converting the below unstructured data to a CSV format please. { "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG", "branchName" : "NEWYORK-SSDF", "branchProductId" : "72Y5HFHSF7H3RUNAWEF", "PreferenceId" : "BASDBVcbzcYHcb", "emailId" :...

LEARN ABOUT OSF1

telecode

telecode(5)							File Formats Manual						       telecode(5)

NAME

       telecode - A character encoding system (codeset) for Traditional Chinese

DESCRIPTION

       The  Telecode  codeset  (called Mitac Telex in early versions of the operating system) consists of 2 character planes. Each character plane
       has 8836 character positions. In plane 1, standard characters occupy positions 0001 to 8045; the remaining  791	positions  are	for  user-
       defined characters. In plane 2, standard characters occupy positions 0001 to 8489; the remaining 346 positions are for user-defined charac-
       ters. Telecode uses 2-byte values to represent characters on both planes.

   Plane 1 Character Encoding
       To differentiate plane 1 code from plane 2 code, the most significant bit (MSB) is set on in both bytes of a plane 1  character	code.  The
       following formula calculates the value of a plane 1 character from its position on the plane:

       1st byte = M + 161
       2nd byte = N + 161 - M x 94

       In this formula, N is the position of the character and M = N / 94.

       For example, if a character is at position 2502 on plane 1, its encoding value is BBDB, which is calculated as follows:

       N = 2502, M = 2502/94 = 26 1st byte = 26 + 161 = 187 2nd byte = 2502 + 161 - 26 x 94 = 219

   Plane 2 Character Encoding
       To  differentiate plane 2 code from plane 1 code, the MSB of the first byte is set on and that of the second byte is set off for each plane
       2 character code. The following formula calculates the value of a plane 2 character from its position:

       1st byte = M + 161
       2nd byte = N + 33 - M x 94

       In this formula, N is the position of the character on the plane and M = N / 94.

       For example, if a character is at position 2502 on plane 2, its encoding value is BB5B, which is calculated as follows:

       N = 2502, M = 2502/94 = 26 1st byte = 26 + 161 = 187 2nd byte = 2502 + 33 - 26 x 94 = 91

   Codeset Conversion
       The following codeset converter pairs are available for converting Traditional Chinese characters between telecode and other encoding  for-
       mats.   Refer  to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which telecode
       is the input or output, see the reference page specified in the list item.  big5_telecode, telecode_big5

	      Converting from and to the Big-5 codeset: big5(5).

	      Note that Big-5 encoding is equivalent to the Microsoft code-page format used on PCs for Traditional Chinese. You can therefore  use
	      these converters to convert Traditional Chinese characters between PC code page format and Telecode encoding format. For more infor-
	      mation on how the operating system supports PC code pages, see code_page(5).  dechanyu_telecode, telecode_dechanyu

	      Converting from and to the DEC Hanyu codeset: dechanyu(5).  eucTW_telecode, telecode_eucTW

	      Converting from and to Taiwanese Extended UNIX Code: eucTW(5).

   Font Support for Telecode
       The operating system supports Telecode only through conversion to another codeset.

SEE ALSO

       Commands: locale(1)

       Others:	ascii(5),  big5(5),  Chinese(5),  code_page(5),  dechanyu(5),  dechanzi(5),  eucTW(5),	GBK(5),  i18n_intro(5),  i18n_printing(5),
       iconv_intro(5), l10n_intro(5), sbig5(5)

																       telecode(5)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Discussion started by: Hemamalini

2. Shell Programming and Scripting

Converting windows format file to unix format using script

Discussion started by: sarbjit

3. Shell Programming and Scripting

Converting the date format

Discussion started by: av_sagar

4. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Discussion started by: vharsha