parsing a portion of Data from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting parsing a portion of Data from a text file
# 1  
Old 09-27-2010
parsing a portion of Data from a text file

Hi All,
I need some help to effectively parse out a subset of results from a big results file.

Below is an example of the text file. Each block that I need to parse starts with "Output of GENE for sequence file 100.fasta" (next block starts with another number). I have given the portion of the block that I need to parse out below and rest of the block is incomplete (given only those text thats needed for parsing.)


Code:
# Output of GENE for sequence file 100.fasta
#
#
#
#
# 
# 
# Maximum BLAST-like scores:
# Inner      Max         Sim     S.D.s above     S.D. of
#  frags    Score      P-value    sim. mean        sims
# SCORE     4.145      0.6043        -0.01       0.0274
# OuterSeq
#  frags    0.125      1.0000         0.00       0.0000
#
#
#
#Output of GENE for sequence file 101.fasta
#
#
#
#
#
## Maximum BLAST-like scores:
# Inner      Max         Sim     S.D.s above     S.D. of
#  frags    Score      P-value    sim. mean        sims
# SCORE     2.665      0.8360         0.44       0.0439
# OuterSeq
#  frags  Not found      0.0000         0.00       0.0000
#
#
#
#
#Output of GENE for sequence file 103.fasta
#
#
#
#
#
## Maximum BLAST-like scores:
# Inner      Max         Sim     S.D.s above     S.D. of
#  frags    Score      P-value    sim. mean        sims
# SCORE     3.665      0.8705         1.44       0.0039
# OuterSeq
#  frags  Not found      1.0000         2.00       0.0000

I would like to parse out the number, for example, 100 from the block 'Output of GENE for sequence file 100.fasta" and then the Sim P-values of each block in such a way

Code:
100  0.6043
101 0.8360 
103 0.8705

Please let me know the best and simple way to parse out this using awk or sed.

LA
# 2  
Old 09-27-2010
Code:
$ ruby -ane 'num=$_.scan(/^.*\b(\d+)\.fasta/)[0] if  /Output/; print "#{num[0]} #{$F[3]}\n" if /SCORE/  ' file
100 0.6043
101 0.8360
103 0.8705

# 3  
Old 09-27-2010
Sorry I don't have ruby in my computer
# 4  
Old 09-28-2010
Here's custom AWK script

I just tested using GNU gawk, and it worked for me.
Code:
awk -f awk_parser.awk the_file

I'm new to this forum and editor, so it may not tab-align properly, but this worked for me:
Code:
BEGIN {

        # this_num denotes which sequence file we're currently handling
        # it's used as an index into the associative array caled "pval"
        this_num = 0
}

/Output of GENE/ , /SCORE/ {

        # capture the fasta number
        if( $0 ~ /Output of GENE/ ) {

                where = match( $0, /[0-9]+\.fasta/ )
                fasta_str = substr( $0, where, RLENGTH )

                where = match( fasta_str, /^[0-9]+/ )
                num = substr( fasta_str, where, RLENGTH )

                # print "Located gene sequence file: " num
                this_num = num

        }
        else if( $0 ~ /SCORE/ ) {
                # print "\thandling Sim p-value for SCORE row"
                pval[this_num] = $4
        }
}

END {
        for( seq_file in pval ) {
                print seq_file, pval[seq_file]
        }
}

=====
My output:
Code:
100 0.6043
101 0.8360
103 0.8705

=====

This was "quick and dirty" and as such requires that the SCORE line be output by your utility JUST as you posted here (i.e., a pound-sign, a space, the SCORE word, etc.)

Last edited by Franklin52; 09-28-2010 at 03:24 AM.. Reason: Please use code tags, thank you!
# 5  
Old 09-28-2010
Code:
awk '/fasta$/{split($NF,m,".");printf m[1]}/SCORE/{printf " %s\n",$3}'  file

# 6  
Old 09-28-2010
??

I ran that:
Code:
$ awk '/fasta$/{split($NF,m,".");printf m[1]}/SCORE/{printf " %s\n",$3}'  the_file

100 4.145
101 2.665
103 3.665


Last edited by Franklin52; 09-28-2010 at 03:24 AM.. Reason: adding code tags
# 7  
Old 09-28-2010
try with $4
Code:
awk '/fasta$/{split($NF,m,".");printf m[1]}/SCORE/{printf " %s\n",$4}'  file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing file data

Hey Guys, I'm a novice at shell scripts and i need some help parsing file data. Basically, I want to write a script that retrieves URLs. Here is what I have so far. #!/bin/bash echo "Please enter start date (format: yyyy-mm-dd):\c" read STARTDATE echo "Please enter end date... (7 Replies)
Discussion started by: silverdust
7 Replies

2. Shell Programming and Scripting

Parsing chunks of text and finding data

Hi, I need a script that parses and greps data out of a textfile. I have a text file that has this structure: File1 host1.localdomain text random text Found errors this text is random (41123) --- random random at.5165 ---- random random at.5165 ---- random random at.5165 ----... (2 Replies)
Discussion started by: erick_tuk
2 Replies

3. Shell Programming and Scripting

Removing a portion of data in a file

Hi, I have a folder that contains many (multiple) files 1.fasta 2.fasta 3.fasta 4.fasta 5.fasta . . 100's of files Each such file have data in the following format for example: vi 1.fasta Code: >AB_1 MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM (6 Replies)
Discussion started by: Lucky Ali
6 Replies

4. Shell Programming and Scripting

parsing data and incorporating it into another file

Hi All I have two files: file 1 >AB_1 MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII RRIKRDMMERGRSLESIIDQYTSVVKPMYHQFIEPSKRYADIVIPEGVSNVVAIDVINSKIASILGEV >AB_2... (5 Replies)
Discussion started by: Lucky Ali
5 Replies

5. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

6. Shell Programming and Scripting

Extracting a portion of a data file with identifier

Hi, I do have a TAB delimted text file with the following format. 1 (- identifier of each group. this text is not present in the file only number) 1 3 4 65 56 WERTF 2 3 4 56 56 GHTYHU 3 3 5 64 23 VMFKLG 2 1 3 4 65 56 DGTEYDH 2 3 4 56 56 FJJJCKC 3 3 5 64 23 FNNNCHD 3 1 3 4 65 56 JDHJDH... (9 Replies)
Discussion started by: Lucky Ali
9 Replies

7. Shell Programming and Scripting

How to extract a text portion from a file

Can some one help me with shell script to extract a text block between two known strings. The given input file is as below: Name: abs Some tesxt.... Some tesxt.... Some tesxt.... end of text Name: xyz Some tesxt.... Some tesxt.... Some tesxt.... end of text Name: efg Some... (5 Replies)
Discussion started by: ejazs0
5 Replies

8. UNIX for Dummies Questions & Answers

How to extract a portion of text from a log file

I am using Unix on Mac OS X 10.5.6. I am trying to extract the last entry of a log (text) file. As seen below, each log entry looks like the following (date and time change with each log entry): I want the script to extract everything quoted above, including the "===" dividers. ... (2 Replies)
Discussion started by: atilano
2 Replies

9. Shell Programming and Scripting

Separate a portion of text file into another file

Hi, I have my input as follows : I have given two entries- From system Mon Aug 1 23:52:47 2005 Source !100000006!: Impact !100000005!: High Status ! 7!: New Last Name+!100000001!: First Name+ !100000003!: ... (4 Replies)
Discussion started by: srikanth_ksv
4 Replies

10. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies
Login or Register to Ask a Question