Helping in parsing subset of text from a big results file

04-13-2010

Registered User

175, 0

Join Date: Oct 2009

Last Activity: 31 January 2014, 7:06 PM EST

Posts: 175

Thanks Given: 18

Thanked 0 Times in 0 Posts

Helping in parsing subset of text from a big results file

Hi All,
I need some help to effectively parse out a subset of results from a big results file.

Below is an example of the text file. Each block that I need to parse starts with "reading sequence file 10.codon" (next block starts with another number) and ends with **p-Value(s)**. I have given the first complete block below and rest of the block is incomplete (given only those text thats needed for parsing.

Code:

Reading sequence file 10.codon
Found 8 sequences of length 3081
Alignment looks like a valid DNA alignment.
Estimated diversity is (pairwise deletion - ignoring missing/ambig):  6.1%
Found 51 informative sites.
Writing alignment of informative sites to: Phi.inf.sites
Writing list of informative sites to:      Phi.inf.list
Calculating all pairwise incompatibilities...
Done:   0.0%^H^H^H^H^H^H100.0%

Distribution of scaled incompatibility scores:
Score (%):
 0   (84.9): ooooooooooooooooooooooooooooooooooooooooooo
 1   (15.1): oooooooo

Using a window size of 100 with k as 2

Calculating analytical mean and variance

Doing permutation test for PHI

Doing permutation test for NSS

The Neighbour Similarity score is 8.0000e-01

Doing Permutation test for MAXCHI

Number of umabiguous polymorphic sites is 678
Writing  alignment of polymorphic unambig sites to: Phi.poly.sites
Window size is 452 polymorphic sites

Best breakpoint for Max Chi found with sequences CtBTz and CtSwe. r and s are 24 and 26
Value of maximum breakpoint is:    19.8

Coordinates of breakpoint with only polymorphic sites (start,breakpoint,end) = (0, 226, 452)
Coordinates of breakpoint with all sites (start,breakpoint,end)=(98, 1260, 2265)

                      PHI Values
                      ----------
              Analytical    (1000) Permutations

Mean:          1.51e-01          1.51e-01
Variance:      7.16e-04          7.21e-04
Observed:      1.11e-01          1.11e-01


     **p-Value(s)**
       ----------

NSS:                 4.16e-01  (1000 permutations)
Max Chi^2:           0.00e+00  (1000 permutations)
PHI (Permutation):   9.50e-02  (1000 permutations)
PHI (Normal):        6.63e-02

Reading sequence file 100.codon
....
.....
.....
.....

     **p-Value(s)**
       ----------

NSS:                 1.00e+00  (1000 permutations)
Max Chi^2:           9.93e-01  (1000 permutations)
PHI (Permutation):   1.00e+00  (1000 permutations)
PHI (Normal):        1.00e+00

Reading sequence file 102.codon
.....
.....
....
....
....
     **p-Value(s)**
       ----------

NSS:                 1.26e-01  (1000 permutations)
Max Chi^2:           4.38e-01  (1000 permutations)
PHI (Permutation):   3.82e-01  (1000 permutations)
PHI (Normal):        3.82e-01

I would like to parse out the number, for example, 10 from the block Reading sequence file 10.codon and then the p-values of each block in such a way

Code:

10 4.16e-01 0.00e+00 9.50e-02 6.63e-02 (tab delimited)
100 1.00e+00  9.93e-01 1.00e+00 1.00e+00
102 1.26e-01 4.38e-01 3.82e-01 3.82e-01

Please let me know the best and simple way to parse out this using awk or sed.

LA

Lucky Ali

View Public Profile for Lucky Ali

Find all posts by Lucky Ali

04-14-2010

Registered User

1,690, 205

Join Date: Jun 2007

Last Activity: 13 July 2020, 5:35 PM EDT

Location: Mumbai, India

Posts: 1,690

Thanks Given: 139

Thanked 205 Times in 199 Posts

basically,

Code:

awk '
/^Reading sequence file/ {r1=$4;gsub(/\..*/,"",r1)}
/^NSS:/ {r2=$2} /^Max Chi/ {r3=$3}
/^PHI \(Permutation\)/ {r4=$3}
/^PHI \(Normal\)/ {r5=$3;print r1,r2,r3,r4,r5}' file

works for me for the given sample.

I am assuming that sequences are continuous and must occur for each set of block.
else this code may give wrong results.

clx

View Public Profile for clx

Find all posts by clx

Shell Programming and Scripting

Helping in parsing subset of text from a big results file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Create a text file and a pdf file from Linux command results.

Discussion started by: jcdole

2. Shell Programming and Scripting

Parsing a subset of data from a large matrix

Discussion started by: Kanja

3. UNIX for Dummies Questions & Answers

Swapping the columns of a text file for a subset of rows

Discussion started by: evelibertine

4. Shell Programming and Scripting

parsing characters and number from a big file with brackets

Discussion started by: Lucky Ali

5. Shell Programming and Scripting

Very big text file - Too slow!

Discussion started by: fedonMan

6. Shell Programming and Scripting

parsing data from a big file using keys from another smaller file

Discussion started by: Lucky Ali

7. Shell Programming and Scripting

Print some results in a text file using script in linux

Discussion started by: viriimind

8. Shell Programming and Scripting

Need help parsing a text file

Discussion started by: streetfighter2

9. Shell Programming and Scripting

HELP: I need to sort a text file in an uncommon manner, can't get desired results

Discussion started by: JakeKatz

10. Shell Programming and Scripting

Cut big text file into 2

Discussion started by: sandy221