[SOLVED] Converting data from one format to the other


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [SOLVED] Converting data from one format to the other
# 1  
Old 12-11-2012
[SOLVED] Converting data from one format to the other

Hi All,

I need to convert an exel spreadsheet into a SAS dataset,
and the following format change is needed. Please help, this is too complex
for a biologist.


Let me describe the input.

1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5 generations in my actual data,
I have shown 3 here, namely G1,G2 and PAR.

2nd row is family name,1st column is keyword 'gene', from 2nd column format is NameParent1-NameParent2-RandomNumber_Replicate.
If 'Parent' is present in the name, then format is NameParent-Parent-RandomNumber_Replicate and it should be included in the output.
The characters upto the second '-' is the family name, replicate (can be 1,2 or 3) is the number after '_'.
Random number (between second '-' and '_') can be ignored.

Starting 3rd row are gene names(column 1) and values(column 2 through 9).
There are 30000 genes in my actual data set.

The gene values need to repeat according to nested format shown below,
Only if the generation is PAR (stands for Parent), replace the generation 'PAR' by name of the parent
specified by the characters upto first '-'. For example if P1-Parent-9_1 is the column name and the
corresponding generation is PAR, the output should have 'P1' instead of 'PAR' in the generations column.


Sample input

Code:
generation G1    G1    G1    G2    G2    G2    PAR    PAR    PAR    PAR    PAR    PAR
gene    P1-P2-24_3    P1-H-56_2    P1-P2-84_1    P1-P2-34_3    P1-P2-33_1    P1-P2-99_2 P1-Parent-9_1 P1-Parent-43_2 P1-Parent-45_3 P2-Parent-62_1 P2-Parent-43_2 P2-Parent-11_3
gene1    0    0    0    0    0    0    0    0    0    3    3    7
gene2    2    1    1    2    6    4    6    7    8    67    6    66

Expected Output

Code:
family    rep    generation    gene     value        
P1-P2    3    G1        gene1    0
P1-P2    2    G1        gene1    0    
P1-P2    1    G1        gene1    0
P1-P2    3    G2        gene1    0
P1-P2    1    G2        gene1    0    
P1-P2    2    G2        gene1     0
P1-P2    1    P1        gene1    0
P1-P2    2    P1        gene1     0
P1-P2    3    P1        gene1    0
P1-P2    1    P2        gene1    3
P1-P2    2    P2        gene1    3
P1-P2    3    P2        gene1    7
P1-P2    3    G1        gene2    2
P1-P2    2    G1        gene2    1    
P1-P2    1    G1        gene2    1
P1-P2    3    G2        gene2    2
P1-P2    1    G2        gene2    6    
P1-P2    2    G2        gene2     4
P1-P2    1    P1        gene2     6
P1-P2    2    P1        gene2     7
P1-P2    3    P1        gene2    8
P1-P2    1    P2        gene2    67
P1-P2    2    P2        gene2    6
P1-P2    3    P2        gene2    66

Thanks
# 2  
Old 12-11-2012
Is this excel spreadsheet actually an excel spreadsheet, or a csv, or a flatfile?

If a flatfile, what separator does it use?
# 3  
Old 12-11-2012
I have excel spreadsheet but I can convert that into flat text file with tabbed separators.

Thanks
# 4  
Old 12-11-2012
Partly working, but one thing I don't understand, where do you get P1-P2 from P2-Parent-11_3 ?
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 12-11-2012
Just because both P1-Parent and P2-Parent belongs to the P1-P2 family, maybe the keyword 'P2' from P2-Parent-11_3 can be matched with 'P2' in 'P1-P2' ??

I have a different file for each family, so if you force both P1 and P2 to be a part of the P1-P2 family, that will be fine for now.
# 6  
Old 12-11-2012
How do they belong to the P1-P2 family?
# 7  
Old 12-11-2012
try:
Code:
awk '
BEGIN {
  OFS="\t";
  print "family","rep","generation","gene","value";
}
/generation/ {for (i=2; i<=NF; i++) gn[i]=$i; next;}
$1 == "gene" {fm=$2; sub("-[^-]*$","",fm);
  par=fm;
  sub("-.*","", par);
  for (i=2; i<=NF; i++) {
    if ($i ~ /Par/) {gn[i]=$i; sub("-.*","",gn[i]);}
    sub(".*_","",$i);rp[i]=$i
  }; next;
}
{ for (i=2; i<=NF; i++ ) {
   print fm, rp[i], gn[i], $1, $i;
  }
}
' OFS="\t" input

This User Gave Thanks to rdrtx1 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Converting unstructured data to structured data

Hi, Can someone help in converting the below unstructured data to a CSV format please. { "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG", "branchName" : "NEWYORK-SSDF", "branchProductId" : "72Y5HFHSF7H3RUNAWEF", "PreferenceId" : "BASDBVcbzcYHcb", "emailId" :... (9 Replies)
Discussion started by: naveen.kuppili
9 Replies

2. Shell Programming and Scripting

Converting another line to another format

Hi there, How can i shorten this: grep -ri "Password must meet complexity requirements" "$line" | sed 's/\t/<\/td><td>/' | sed 's/^.*:/<tr><td>/'| sed 's/$/<\/td><\/tr>/' I am looking for a shorter alternative of sed. What I was trying to do is to change the string output format from ... (3 Replies)
Discussion started by: alvinoo
3 Replies

3. Programming

Visual Basic converting a decimal data type to a label with currency format

Here is the code that I am working with. I have tried several other things. any suggestions? Lbl_Cost_Output.Text = (dDistance * dCostPerMile).ToString("C") The label is formatted correctly in terms of value 0.00 but no dollar sign appears. Please let me know if you have any questions. (1 Reply)
Discussion started by: briandanielz
1 Replies

4. Shell Programming and Scripting

Need help in converting the file format

Hi All, I need help in converting the mentioned file format into desired output format using awk. Could anyone help me in this? Below is the input.. Date Account Campaign AdGroup Keyword Conversion Revenue Var1 Var2 Var3 Var4 Var5 10 20 30 ... (8 Replies)
Discussion started by: Ravi S M
8 Replies

5. Shell Programming and Scripting

Converting text files to xls through awk script for specific data format

Dear Friends, I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls. The input is as follows: >gi|1234|ref| Query = 1 - 65, Target = 1677 - 1733 Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46 fd sdfsdfsdfsdf fsdfdsfdfdfdfdfdf... (6 Replies)
Discussion started by: Amit1
6 Replies

6. Shell Programming and Scripting

[Solved] Converting the data into matrix with 0's and 1's

I have a file that contains 2 columns tag,pos cat input_file tag pos atg 10 ata 16 agt 15 agg 19 atg 17 agg 14 I have used following command to sort the file based on second column sort -k 2 input_file tag pos atg 10 agg 14 agt 15 ata 16 agg 19 atg 17 (2 Replies)
Discussion started by: raj_k
2 Replies

7. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

8. Shell Programming and Scripting

Converting the date format

Hi All, I am new to this forum. Could anyone help me to resolve the following issue. Input of the flat file contains several lines of text for example find below: 5022090,2,4,7154,88,,,,,4/1/2011 0:00,Z,L,2 5022090,3,1,6648,88,,,,,4/1/2011 0:00,Z,,1 5022090,4,1,6648,88,,,,,4/1/2011... (6 Replies)
Discussion started by: av_sagar
6 Replies

9. Shell Programming and Scripting

Converting windows format file to unix format using script

Hi, I am having couple of files which i used to copy from windows to Linux, so now in case of text files (CTRL^M) appears at end of line. I know i can convert this windows format file to unix format file by running dos2unix. My requirement here is that i want to do it automatically using a... (5 Replies)
Discussion started by: sarbjit
5 Replies

10. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies
Login or Register to Ask a Question