Helping in parsing subset of text from a big results file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Helping in parsing subset of text from a big results file
# 1  
Old 04-13-2010
Helping in parsing subset of text from a big results file

Hi All,
I need some help to effectively parse out a subset of results from a big results file.

Below is an example of the text file. Each block that I need to parse starts with "reading sequence file 10.codon" (next block starts with another number) and ends with **p-Value(s)**. I have given the first complete block below and rest of the block is incomplete (given only those text thats needed for parsing.

Code:
Reading sequence file 10.codon
Found 8 sequences of length 3081
Alignment looks like a valid DNA alignment.
Estimated diversity is (pairwise deletion - ignoring missing/ambig):  6.1%
Found 51 informative sites.
Writing alignment of informative sites to: Phi.inf.sites
Writing list of informative sites to:      Phi.inf.list
Calculating all pairwise incompatibilities...
Done:   0.0%^H^H^H^H^H^H100.0%

Distribution of scaled incompatibility scores:
Score (%):
 0   (84.9): ooooooooooooooooooooooooooooooooooooooooooo
 1   (15.1): oooooooo

Using a window size of 100 with k as 2

Calculating analytical mean and variance

Doing permutation test for PHI

Doing permutation test for NSS

The Neighbour Similarity score is 8.0000e-01

Doing Permutation test for MAXCHI

Number of umabiguous polymorphic sites is 678
Writing  alignment of polymorphic unambig sites to: Phi.poly.sites
Window size is 452 polymorphic sites

Best breakpoint for Max Chi found with sequences CtBTz and CtSwe. r and s are 24 and 26
Value of maximum breakpoint is:    19.8

Coordinates of breakpoint with only polymorphic sites (start,breakpoint,end) = (0, 226, 452)
Coordinates of breakpoint with all sites (start,breakpoint,end)=(98, 1260, 2265)

                      PHI Values
                      ----------
              Analytical    (1000) Permutations

Mean:          1.51e-01          1.51e-01
Variance:      7.16e-04          7.21e-04
Observed:      1.11e-01          1.11e-01


     **p-Value(s)**
       ----------

NSS:                 4.16e-01  (1000 permutations)
Max Chi^2:           0.00e+00  (1000 permutations)
PHI (Permutation):   9.50e-02  (1000 permutations)
PHI (Normal):        6.63e-02

Reading sequence file 100.codon
....
.....
.....
.....

     **p-Value(s)**
       ----------

NSS:                 1.00e+00  (1000 permutations)
Max Chi^2:           9.93e-01  (1000 permutations)
PHI (Permutation):   1.00e+00  (1000 permutations)
PHI (Normal):        1.00e+00

Reading sequence file 102.codon
.....
.....
....
....
....
     **p-Value(s)**
       ----------

NSS:                 1.26e-01  (1000 permutations)
Max Chi^2:           4.38e-01  (1000 permutations)
PHI (Permutation):   3.82e-01  (1000 permutations)
PHI (Normal):        3.82e-01

I would like to parse out the number, for example, 10 from the block Reading sequence file 10.codon and then the p-values of each block in such a way

Code:
10 4.16e-01 0.00e+00 9.50e-02 6.63e-02 (tab delimited)
100 1.00e+00  9.93e-01 1.00e+00 1.00e+00
102 1.26e-01 4.38e-01 3.82e-01 3.82e-01

Please let me know the best and simple way to parse out this using awk or sed.

LA
# 2  
Old 04-14-2010
basically,
Code:
awk '
/^Reading sequence file/ {r1=$4;gsub(/\..*/,"",r1)}
/^NSS:/ {r2=$2} /^Max Chi/ {r3=$3}
/^PHI \(Permutation\)/ {r4=$3}
/^PHI \(Normal\)/ {r5=$3;print r1,r2,r3,r4,r5}' file

works for me for the given sample.

I am assuming that sequences are continuous and must occur for each set of block.
else this code may give wrong results.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Create a text file and a pdf file from Linux command results.

Hello. The task : Using multiple commands like : gdisk -l $SOME_DISK >> $SOME_FILEI generate some text file. For readiness I must insert page break. When the program is finished I want to convert the final text file to a pdf file. When finished, I got two files : One text file and One pdf... (1 Reply)
Discussion started by: jcdole
1 Replies

2. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies

3. UNIX for Dummies Questions & Answers

Swapping the columns of a text file for a subset of rows

Hi, I'd like to swap the columns 1 and 2 of a space-delimited text file but only for the first 1000 rows. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

parsing characters and number from a big file with brackets

I have a big file with many brackets () in it from which I need to parse number characters and numbers. Below is an example of my file 14 (((A__0:0.02,B__1:0.3)0:0.04,C__0:0.025)2:0.01),(D__0:0.00978,E__2:0.01031)1:0.00362; 15... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

5. Shell Programming and Scripting

Very big text file - Too slow!

Hello everyone, suppose there is a very big text file (>800 mb) that each line contains an article from wikipedia. Each article begins with a tag (<..>) containing its url. Currently there are 10^6 articles in the file. I want to take random N articles, eliminate all non-alpharithmetic... (14 Replies)
Discussion started by: fedonMan
14 Replies

6. Shell Programming and Scripting

parsing data from a big file using keys from another smaller file

Hi, I have 2 files format of file 1 is: a1 b2 a2 c2 d1 f3 format of file 2 is (tab delimited): a1 1.2 0.5 0.06 0.7 0.9 1 0.023 a3 0.91 0.007 0.12 0.34 0.45 1 0.7 a2 1.05 2.3 0.25 1 0.9 0.3 0.091 b1 1 5.4 0.3 9.2 0.3 0.2 0.1 b2 3 5 7 0.9 1 9 0 1 b3 0.001 1 2.3 4.6 8.9 10 0 1 0... (10 Replies)
Discussion started by: Lucky Ali
10 Replies

7. Shell Programming and Scripting

Print some results in a text file using script in linux

hello everyone, i really need your help to write a script which would just print following kind of result into a text file (result.txt) XYZ test Results ID: <unique-id> Date: <date> ------------------------------------------------- | Task | Result | Time |... (3 Replies)
Discussion started by: viriimind
3 Replies

8. Shell Programming and Scripting

Need help parsing a text file

I have a text file: router1#sh ip blah blah | incl --- Gi2/8 10.60.4.181 --- 10.60.123.175 11 0000 0000 355K Gi2/8 10.60.83.28 --- 224.10.10.26 11 F9FF 3840 154K Gi2/8 10.60.83.198 --- ... (1 Reply)
Discussion started by: streetfighter2
1 Replies

9. Shell Programming and Scripting

HELP: I need to sort a text file in an uncommon manner, can't get desired results

Hi All I have a flat text file. Each line in it contains a "/full path/filename". The last three columns are predictable, but directory depth of each line varies. I want to sort on the last three columns, starting from the last, 2nd last and 3rd last. In that order. The last three columns... (6 Replies)
Discussion started by: JakeKatz
6 Replies

10. Shell Programming and Scripting

Cut big text file into 2

I have a big text file. I want to cut it into 2 pieces at known point or I know the pattern of the contents from where it can separate the files. Is there any quick command/solution? (4 Replies)
Discussion started by: sandy221
4 Replies
Login or Register to Ask a Question