Help with an (easy) parser


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with an (easy) parser
# 1  
Old 07-14-2008
Help with an (easy) parser

Hello, i'm workig with a file with structural information about biological macromolecules (proteins etc).

In a certain file, the info is structured like this

@<TRIPOS>MOLECULE
blah 1
blah 2
blah 3
@<TRIPOS>MOLECULE
foo 1
foo 2
foo 3
@<TRIPOS>MOLECULE
mmm 1
mmm 2
mmm 3

I would like to slipt the info in a file for each @<TRIPOS>MOLECULE entry

File 1:
@<TRIPOS>MOLECULE
blah 1
blah 2
blah 3

File 2:
@<TRIPOS>MOLECULE
foo 1
foo 2
foo 3

etc etc

How could be the best way to implement this parser?. Could anybody show me an example with something similar, please, to write my own parser?

Thank you in advanced!!! best regards.
# 2  
Old 07-14-2008
Question preferences?

1) what to call the filenames? is it part of the text already in the file? Or simply file_0001 then file_0002 then ....
2) will it always be three lines of data for each file?
# 3  
Old 07-14-2008
Quote:
Originally Posted by joeyg
1) what to call the filenames? is it part of the text already in the file? Or simply file_0001 then file_0002 then ....
2) will it always be three lines of data for each file?
1) OK, is interesting for me that, if the name of the original file is
my_file.mol2

the name of the new files must be
new_file1.mol2 etc etc
new_fiile2.mol2 etc etc

2) No, the information is dinamic, the number of lines between each @<TRIPOS>MOLECULE is not constant.

A "real" example:

@<TRIPOS>MOLECULE
core; [Isoc-015]
41 42 2
SMALL
NO_CHARGES
(...)
@<TRIPOS>MOLECULE
core; [Isoc-167]
41 41 2
SMALL
NO_CHARGES
(...)

Thankx!!!
# 4  
Old 07-14-2008
Code:
awk '/@<TRIPOS>MOLECULE/{k="new_file"n+1"."FILENAME;close(k);f=0;print "@<TRIPOS>MOLECULE" > k} f{print > k} /@<TRIPOS>MOLECULE/{f=1;n++}' my_file.mol2

# 5  
Old 07-14-2008
Hammer & Screwdriver Many ways to do, but here is one

Code:
> cat in_file
@<TRIPOS>MOLECULE
core; [Isoc-015]
41 42 2
SMALL
NO_CHARGES
(...)
@<TRIPOS>MOLECULE
core; [Isoc-167]
41 41 2
SMALL
NO_CHARGES
(...)
@<TRIPOS>MOLECULE
core; [Isoc-085]
41 42 2
SMALL
NO_CHARGES
(...)
@<TRIPOS>MOLECULE
core; [Isoc-267]
41 41 2
SMALL
NO_CHARGES
(...)

Code:
> cat mk_file
#! /usr/bin/bash
#
# reads in_file
# creates sequentially numbered output files

count=0000
out_f="molecule"
while read zf
   do
   char1=$(echo "$zf" | cut -c1)
   if [ $char1 = "@" ]
      then
      count=$((count+1)) 
      countt=$(printf "%.4d" "$count")
      out_file="$out_f""$countt"
   fi
   echo $countt "$zf"
   echo "$zf" >>$out_file 
done<in_file

Program execution displays progress and builds files
Code:
> ls molecule*
molecule0001  molecule0002  molecule0003  molecule0004

Note that I designed to count from 0001 to 9999 for molecule filenames.
# 6  
Old 07-14-2008
I'm trying to use csplit, LibDav.mol2 is the original file

csplit -k -f LibDav LibDav.mol2 %@\<TRIPOS\>MOLECULE%

but LibDav00 and LibDav.mol2 are the same file. Any suggestions? Thank you in advanced.
# 7  
Old 07-14-2008
Quote:
Originally Posted by rubin
Code:
awk '/@<TRIPOS>MOLECULE/{k="new_file"n+1"."FILENAME;close(k);f=0;print "@<TRIPOS>MOLECULE" > k} f{print > k} /@<TRIPOS>MOLECULE/{f=1;n++}' my_file.mol2

if its too long, break and indent them. for the benefit of readability
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Making a parser

input 1..100km 112..403km 500..623km required output 1..51 112..162 500..550 (i.e 50kms added to the initial distance) (2 Replies)
Discussion started by: ANKIT ROY
2 Replies

2. Shell Programming and Scripting

Parser

Hi All, I am trying to create a parser to find out what cobol programs are being called by which JCL's. I need to search recursively until the main cobol program is found being called by a JCL. I tried to create a script but I am not able to generalize it. Can someone please help. ... (1 Reply)
Discussion started by: nua7
1 Replies

3. Shell Programming and Scripting

SQL Parser

Hi, I have been assigned a task to migrate few thousands of sql scripts to a different db format. there could be sub queries and complex joins. there would be functions that needs to be replaced from a given list to another values. this should also parse the sub\inline queries. Can you please... (1 Reply)
Discussion started by: hitmansilentass
1 Replies

4. Shell Programming and Scripting

File Parser

Hi need help parsing a file. I have tag fields and values in a file with delimiter |. sample records from the file listed below 8=value|9=value|35=value|49=value|56=value|34=value|50=value|48=value|10=value 8=value|9=value|35=value|49=value|56=value|34=value|51=value|48=value|10=value... (2 Replies)
Discussion started by: subramanian
2 Replies

5. Programming

Parser

Hi Everyone I have an out put of multiple lines which I would like to parse and retrieve certain info from it. The output consists of multiple sections that starts with the line Client: and ends with STL tag: each section separated by an empty line. So basically somehting like Client: ... (10 Replies)
Discussion started by: bombcan1
10 Replies

6. Shell Programming and Scripting

Parser with sed

Hi, I have this variable: <a href="http://www.rtve.es/mediateca/videos/20100916/video-calamares-rellenos-salsa-pimientos-garbanzos-16-09-10/878586.shtml">V�deo: Calamares rellenos con salsa de pimientos y ...</a> I would like to have: ... (7 Replies)
Discussion started by: mierdatuti
7 Replies

7. Shell Programming and Scripting

need a text parser

i need a simple text parser which can parse a data file created by a softwre so that i can export it to my mysqldb,, datafile created as one record per line with different number of fields. e.g datafile contains following. a=1, b=3, c=4 a=1, c=55, d=abcd a=5, b=hello, c=99, d=help now i... (12 Replies)
Discussion started by: sfaizan
12 Replies

8. Shell Programming and Scripting

Text Parser

I am having a text file as follows say server.txt Date Time server ip error code -------------------------------------------------------------------------- 02/21/2008 18:10:14 server1 xxx.xxx.xxx.xxx 6 02/21/2008 08:10:14 server2 ... (8 Replies)
Discussion started by: karthikn7974
8 Replies

9. UNIX for Dummies Questions & Answers

Need help on installing an EASY to use and easy to install command line text editor

Hi again. Sorry if it seems like I'm spamming the boards a bit, but I figured I might as well ask all the questions I need answers to at once, and hopefully at least get some. I have installed Solaris 10 on a server. The default text editors are there (vi, ex, ed, maybe others, I know emacs is... (4 Replies)
Discussion started by: EugeneG
4 Replies

10. Shell Programming and Scripting

string parser

I am new to scripting I want to parse a string in a loop eg A:B:C:D E:F:G:H and put them in different variable attr1 = A attr2 = B attr3 = C attr4 = D . . /* do processing with attr1, attr2, attr3 and attr4 */ then go to next line E:F:G:H and again assign... (8 Replies)
Discussion started by: flextronics
8 Replies
Login or Register to Ask a Question