Converting text files to xls through awk script for specific data format


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Converting text files to xls through awk script for specific data format
# 1  
Old 01-08-2014
Converting text files to xls through awk script for specific data format

Dear Friends,
I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls.
The input is as follows:
Code:
>gi|1234|ref|
  
 Query = 1 - 65, Target = 1677 - 1733
 Score = 8.38, E = 0.6529, P = 0.0001513, GC =  46
  fd sdfsdfsdfsdf
fsdfdsfdfdfdfdfdf
dfdfdfdfedfsdfsdfdsf
dfdfdfsdfsdfsdfsdfdsf
sdfdsfdfdfdfdsfdfdfdfd
  
>gi|54367|ref|


  fd sdfsdfsdfsdf
fsdfdsfdfdfdfdfdf
dfdfdfdfedfsdfsdfdsf
dfdfdfsdfsdfsdfsdfdsf
sdfdsfdfdfdfdsfdfdfdfd

 Query = 1 - 65, Target = 6780 - 6812
 Score = 8.13, E = 0.7692, P = 0.0001782, GC =  36

//
................EOF................................

The desired output EXCEL table will the contain the headers and data in the following order:
Code:
            Serial   Query start    Query End   target start   target end   Score   E   P   GC
       >gi|1234|ref|   1   65   1677   1733   8.38   0.6529   0.000151   46
       >gi|54367|ref|   1   65   6780   6812   8.13   0.7692   0.000178   36

I have to iteratively perform this conversion in several (1000+ ) .txt files.
Code:
awk '/gi/ {print ;}'
awk '/Query/{print $3,$5,$8,$10;}'
awk '/Score/ {print $3, $6, $9, $12;}'

I was trying with the preliminary awk one-liners and trying to use "CAT" for concatenating all 3 .xls output which did not fulfiling the purpose.
I need urgent help!!!!!!!!!, looking forward for your responses (new script/ modified version of the one-liners).
P.S: Input file and output file is attached for more clarity.
Best regards,
Amit
Moderator's Comments:
Mod Comment Although attaching files sometimes helps, using CODE tags (as required by forum rules) makes it much easier for those of us trying to help you see spacing between fields and to be able to see field alignments. Please use CODE tags when including sample code, sample input file data, and sample output file data.

Last edited by Don Cragun; 01-10-2014 at 05:37 AM..
# 2  
Old 01-08-2014
It will look something like this, I leave it to you to debug it

awk -f x.awk inputfile

where x.awk is...

(then you import it as a comma-delimited file)

Code:
BEGIN{print "Serial,Query Start,Query End, Target Sart, Target End, E,P,GC"}
/^\>g/ { serial=$0 }
/Query =/ {
        split($0,a," ")
        query_start=a[3]
        query_end=a[5]
        query_end=gsub(/,/,"",query_end)
        target_start=a[8]
        target_end=a[10]

/Score =/{
        split($0,a," ")
        score=a[3]
        score=gsub(/,/,"",score)
        e=a[6]
        e=gsub(/,/,"",e)
        p=a[1]
        p=gsub(/,/,"",p)
        gc=a[12]

        printf("%s,%s,%s,%s,%s,%s,%s,%s\n",query_start, query_end,target_start,target_end,score,e,p,gc)
        }

This User Gave Thanks to blackrageous For This Post:
# 3  
Old 01-10-2014
Thanks for the help, i will try it out.
Best regards,
Amit
# 4  
Old 01-10-2014
A closing } is missing for the /Query =/
# 5  
Old 01-10-2014
Note also that gsub(ERE, replacement, string) returns the number of substitutions performed, not the modified string. And, since there are multiple commas on the Score line, it might make more sense to get rid of all of the commas on the input line in one shot before splitting out the fields than to perform gsub() calls on the split fields. In fact, if we set the field separator differently, we can avoid the need to perform any substitutions. The following creates a comma separated values file that can easily be loaded into Excel:
Code:
awk '
BEGIN { FS = "[ ,]*"
        OFS = ","
        print "Serial", "Query start", "Query End", "target start", "target end", "Score", "E", "P", "GC"
}
/^>/ {  S = $1 }
/Query/ {
        QS = $4
        QE = $6
        TS = $9
        TE = $11
}
/Score/ {
        print S, QS, QE, TS, TE, $4, $7, $10, $13
}' "Input file.txt"

which, with your sample input produces:
Code:
Serial,Query start,Query End,target start,target end,Score,E,P,GC
>gi|1234|ref|,1,65,1677,1733,8.38,0.6529,0.0001513,46
>gi|54367|ref|,1,65,6780,6812,8.13,0.7692,0.0001782,36

If you save this output in a file named input.csv and open it with Excel, you'll get your desired spreadsheet. Then, if .csv format isn't suitable, you can have Excel save it to a file in .xlsx format.

Note that you can give this script multiple input files which will still produce a single .csv output file. You might, however, run into ARG_MAX limitations on your system if you replace the last line of the script with:
Code:
}' *.txt

with over a thousand .txt files. I'm sure you can figure out how to group sets of .txt files into suitable chunks and combine the resulting output files (stripping off the header in all output files except the first one) to get a single, large .csv file to feed to Excel.

Last edited by Don Cragun; 01-12-2014 at 01:48 PM.. Reason: Fix typos, add note.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 01-10-2014
Assuming you want to extract numeric value from line with a pattern Query and Score..

Code:
awk '
function parse(){
                   gsub(/[[:alpha:]]|[-,=]/,x)
                   $0=$0;$1=$1
                   q = q ? q OFS $0 : $0
                } 
           NR==1{
                 print "Serial", "Query start", "Query End", "target start", "target end", "Score", "E", "P", "GC"
                }      
            /^>/{
                   s=$0
                   next
                }
/Query/||/Score/{
                   parse()
                if(++i==2){
                                print s,q
                                q=s=i=""
                          }          
                }
    ' OFS="\t" file

Code:
Serial    Query start    Query End    target start    target end    Score    E    P    GC
>gi|1234|ref|    1    65    1677    1733    8.38    0.6529    0.0001513    46
>gi|54367|ref|    1    65    6780    6812    8.13    0.7692    0.0001782    36

This User Gave Thanks to Akshay Hegde For This Post:
# 7  
Old 01-12-2014
Dear DON and Akshay,
Thanks for your help, this will definitely ease my work.

Best regards,
Amit
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Converting xls file to xlsx on UNIX script / command line.

Hi All, Am needing advise on how to convert xls file to xlsx format on Solaris unix command line or scripting. I tried searching online but it looks like I need to either use Perl packages of Excel or Python packages or some other 3rd party tool. Problem is to install any of these will require... (2 Replies)
Discussion started by: arvindshukla81
2 Replies

2. Shell Programming and Scripting

How to convert Text data to xls?

Hi Team, I have created a script to output DB Query data to a text file. It displays output as follows for 2 different queries appended to same file. I want help to convert this data to xls format so that the output of first query is in 1 tab and the other in second tab. Please help. ... (14 Replies)
Discussion started by: srkmish
14 Replies

3. Shell Programming and Scripting

Script for converting to xls and to mail the same

i have installed CENTOS in VMware. I want the linux command to run in CENTOS. The command is to select data's from Event table(our table name) convert into .xls and mail every 2hrs. Can you please help me in writing this script. Thanks, Shobana (1 Reply)
Discussion started by: shobana praveen
1 Replies

4. Shell Programming and Scripting

Need script for transferring bulk files from one format to text format

"Help Me" Need script for transferring bulk files from one format to text format in a unix server. Please suggest (2 Replies)
Discussion started by: Kranthi Kumar
2 Replies

5. Shell Programming and Scripting

[SOLVED] Converting data from one format to the other

Hi All, I need to convert an exel spreadsheet into a SAS dataset, and the following format change is needed. Please help, this is too complex for a biologist. Let me describe the input. 1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5... (9 Replies)
Discussion started by: newbie83
9 Replies

6. Shell Programming and Scripting

Need help in writing a script to create a new text file with specific data from existing two files

Hi, I have two text files. Need to create a third text file extracting specific data from first two existing files.. Text File 1: Format contains: SQL*Loader: Release 10.2.0.1.0 - Production on Wed Aug 4 21:06:34 2010 some text ............so on...and somwhere text like: Record 1:... (1 Reply)
Discussion started by: shashi143ibm
1 Replies

7. Shell Programming and Scripting

Using Awk for extracting data in specific format

please help me writing a awk script 001_r.pdb 0.0265185 001_r.pdb 0.0437049 001_r.pdb 0.0240642 001_r.pdb 0.0310264 001_r.pdb 0.0200482 001_r.pdb 0.0146746 001_r.pdb 0.0351344 001_r.pdb 0.0347856 001_r.pdb 0.036119 001_r.pdb 1.49 002_r.pdb 0.0281011 002_r.pdb 0.0319908 002_r.pdb... (5 Replies)
Discussion started by: phoenix_nebula
5 Replies

8. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

9. UNIX for Dummies Questions & Answers

converting xls,xlsx files ??

I think I know the answer to this :rolleyes: but thought I'd ask anyway. You never know. Does anyone know of a program or utility that will run on any unix platform and convert Microsoft Excel files to ascii/plain text files that unix can understand ? Thanks in advance. Floyd (3 Replies)
Discussion started by: fwellers
3 Replies

10. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies
Login or Register to Ask a Question