Sponsored Content
Top Forums Shell Programming and Scripting [SOLVED] Converting data from one format to the other Post 302742723 by newbie83 on Tuesday 11th of December 2012 01:21:38 PM
Old 12-11-2012
[SOLVED] Converting data from one format to the other

Hi All,

I need to convert an exel spreadsheet into a SAS dataset,
and the following format change is needed. Please help, this is too complex
for a biologist.


Let me describe the input.

1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5 generations in my actual data,
I have shown 3 here, namely G1,G2 and PAR.

2nd row is family name,1st column is keyword 'gene', from 2nd column format is NameParent1-NameParent2-RandomNumber_Replicate.
If 'Parent' is present in the name, then format is NameParent-Parent-RandomNumber_Replicate and it should be included in the output.
The characters upto the second '-' is the family name, replicate (can be 1,2 or 3) is the number after '_'.
Random number (between second '-' and '_') can be ignored.

Starting 3rd row are gene names(column 1) and values(column 2 through 9).
There are 30000 genes in my actual data set.

The gene values need to repeat according to nested format shown below,
Only if the generation is PAR (stands for Parent), replace the generation 'PAR' by name of the parent
specified by the characters upto first '-'. For example if P1-Parent-9_1 is the column name and the
corresponding generation is PAR, the output should have 'P1' instead of 'PAR' in the generations column.


Sample input

Code:
generation G1    G1    G1    G2    G2    G2    PAR    PAR    PAR    PAR    PAR    PAR
gene    P1-P2-24_3    P1-H-56_2    P1-P2-84_1    P1-P2-34_3    P1-P2-33_1    P1-P2-99_2 P1-Parent-9_1 P1-Parent-43_2 P1-Parent-45_3 P2-Parent-62_1 P2-Parent-43_2 P2-Parent-11_3
gene1    0    0    0    0    0    0    0    0    0    3    3    7
gene2    2    1    1    2    6    4    6    7    8    67    6    66

Expected Output

Code:
family    rep    generation    gene     value        
P1-P2    3    G1        gene1    0
P1-P2    2    G1        gene1    0    
P1-P2    1    G1        gene1    0
P1-P2    3    G2        gene1    0
P1-P2    1    G2        gene1    0    
P1-P2    2    G2        gene1     0
P1-P2    1    P1        gene1    0
P1-P2    2    P1        gene1     0
P1-P2    3    P1        gene1    0
P1-P2    1    P2        gene1    3
P1-P2    2    P2        gene1    3
P1-P2    3    P2        gene1    7
P1-P2    3    G1        gene2    2
P1-P2    2    G1        gene2    1    
P1-P2    1    G1        gene2    1
P1-P2    3    G2        gene2    2
P1-P2    1    G2        gene2    6    
P1-P2    2    G2        gene2     4
P1-P2    1    P1        gene2     6
P1-P2    2    P1        gene2     7
P1-P2    3    P1        gene2    8
P1-P2    1    P2        gene2    67
P1-P2    2    P2        gene2    6
P1-P2    3    P2        gene2    66

Thanks
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies

2. Shell Programming and Scripting

Converting windows format file to unix format using script

Hi, I am having couple of files which i used to copy from windows to Linux, so now in case of text files (CTRL^M) appears at end of line. I know i can convert this windows format file to unix format file by running dos2unix. My requirement here is that i want to do it automatically using a... (5 Replies)
Discussion started by: sarbjit
5 Replies

3. Shell Programming and Scripting

Converting the date format

Hi All, I am new to this forum. Could anyone help me to resolve the following issue. Input of the flat file contains several lines of text for example find below: 5022090,2,4,7154,88,,,,,4/1/2011 0:00,Z,L,2 5022090,3,1,6648,88,,,,,4/1/2011 0:00,Z,,1 5022090,4,1,6648,88,,,,,4/1/2011... (6 Replies)
Discussion started by: av_sagar
6 Replies

4. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

5. Shell Programming and Scripting

[Solved] Converting the data into matrix with 0's and 1's

I have a file that contains 2 columns tag,pos cat input_file tag pos atg 10 ata 16 agt 15 agg 19 atg 17 agg 14 I have used following command to sort the file based on second column sort -k 2 input_file tag pos atg 10 agg 14 agt 15 ata 16 agg 19 atg 17 (2 Replies)
Discussion started by: raj_k
2 Replies

6. Shell Programming and Scripting

Converting text files to xls through awk script for specific data format

Dear Friends, I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls. The input is as follows: >gi|1234|ref| Query = 1 - 65, Target = 1677 - 1733 Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46 fd sdfsdfsdfsdf fsdfdsfdfdfdfdfdf... (6 Replies)
Discussion started by: Amit1
6 Replies

7. Shell Programming and Scripting

Need help in converting the file format

Hi All, I need help in converting the mentioned file format into desired output format using awk. Could anyone help me in this? Below is the input.. Date Account Campaign AdGroup Keyword Conversion Revenue Var1 Var2 Var3 Var4 Var5 10 20 30 ... (8 Replies)
Discussion started by: Ravi S M
8 Replies

8. Programming

Visual Basic converting a decimal data type to a label with currency format

Here is the code that I am working with. I have tried several other things. any suggestions? Lbl_Cost_Output.Text = (dDistance * dCostPerMile).ToString("C") The label is formatted correctly in terms of value 0.00 but no dollar sign appears. Please let me know if you have any questions. (1 Reply)
Discussion started by: briandanielz
1 Replies

9. Shell Programming and Scripting

Converting another line to another format

Hi there, How can i shorten this: grep -ri "Password must meet complexity requirements" "$line" | sed 's/\t/<\/td><td>/' | sed 's/^.*:/<tr><td>/'| sed 's/$/<\/td><\/tr>/' I am looking for a shorter alternative of sed. What I was trying to do is to change the string output format from ... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. UNIX for Dummies Questions & Answers

Converting unstructured data to structured data

Hi, Can someone help in converting the below unstructured data to a CSV format please. { "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG", "branchName" : "NEWYORK-SSDF", "branchProductId" : "72Y5HFHSF7H3RUNAWEF", "PreferenceId" : "BASDBVcbzcYHcb", "emailId" :... (9 Replies)
Discussion started by: naveen.kuppili
9 Replies
Spreadsheet::ParseExcel::Simple(3pm)			User Contributed Perl Documentation		      Spreadsheet::ParseExcel::Simple(3pm)

NAME
Spreadsheet::ParseExcel::Simple - A simple interface to Excel data SYNOPSIS
my $xls = Spreadsheet::ParseExcel::Simple->read('spreadsheet.xls'); foreach my $sheet ($xls->sheets) { while ($sheet->has_data) { my @data = $sheet->next_row; } } DESCRIPTION
This provides an abstraction to the Spreadsheet::ParseExcel module for simple reading of values. You simply loop over the sheets, and fetch rows to arrays. For anything more complex, you probably want to use Spreadsheet::ParseExcel directly. BOOK METHODS
read my $xls = Spreadsheet::ParseExcel::Simple->read('spreadsheet.xls'); This opens the spreadsheet specified for you. Returns undef if we cannot read the book. sheets @sheets = $xls->sheets; Each spreadsheet can contain one or more worksheets. This fetches them all back. You can then iterate over them, or jump straight to the one you wish to play with. book my $book = $xls->book; The Spreadsheet::ParseExcel object we are working with. You can use this if you need to manipulate it in ways that this interface doesn't allow. SHEET METHODS
These methods can be called on each sheet returned from $xls->sheets: has_data if ($sheet->has_data) { ... } This lets us know if there are more rows in this sheet that we haven't read yet. This allows us to differentiate between an empty row, and the end of the sheet. next_row my @data = $sheet->next_row; Fetch the next row of data back. sheet my $obj = $sheet->sheet; The underlying Spreadsheet::ParseExcel object for the worksheet. You can use this if you need to manipulate it in ways that this interface doesn't allow (e.g. asking it for the sheet's name). AUTHOR
Tony Bowden BUGS and QUERIES Please direct all correspondence regarding this module to: bug-Spreadsheet-ParseExcel-Simple@rt.cpan.org COPYRIGHT AND LICENSE
Copyright (C) 2001-2005 Tony Bowden. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. SEE ALSO
Spreadsheet::ParseExcel. perl v5.8.8 2008-03-12 Spreadsheet::ParseExcel::Simple(3pm)
All times are GMT -4. The time now is 04:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy