extract blocks of text from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract blocks of text from a file
# 1  
Old 05-10-2009
extract blocks of text from a file

Hi,
This is part of a large text file I need to separate out.
I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and repeat.
The amount of text between the dashes is variable - might be just a couple of lines of text or many lines.
There's one line of space between the dashed line and the first line of text.
Doesn't matter to me if the new output file contains the dashes or not.
It would be nice to flag the ones with "No errors found" by appending that to the filename also, but not necessary.
Thanks!

Input file:

-----------------------------------------------------------------------

3D Survey MBST_BASIN/M93upd05_htti2_TTIvol2_Z (storage m93up5)
No errors found

-----------------------------------------------------------------------

3D Survey m93up5_ip/M93upd05_htti2_TTIvol2_Z (storage m93up5)
No errors found

-----------------------------------------------------------------------

3D Survey MARS_B/Mars-B (storage mars_b)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

mars_b/mars_b01.3dv

-----------------------------------------------------------------------

3D Survey mars_b_ip/Mars-B (storage mars_b)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

mars_b/mars_b01.3dv

-----------------------------------------------------------------------

3D Survey AUGER_123DI/szwauger (storage szwauger)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

szwauger/S_AUGER_123DI_30601.3dh
szwauger/S_AUGER_123DI_30701.3dh
szwauger/S_AUGER_123DI_30801.3dh
szwauger/S_AUGER_123DI_30901.3dh
szwauger/S_AUGER_123DI_31001.3dh
szwauger/S_AUGER_123DI_31101.3dh
szwauger/S_AUGER_123DI_31201.3dh
szwauger/S_AUGER_123DI_31301.3dh
szwauger/S_AUGER_123DI_31401.3dh
szwauger/S_AUGER_123DI_31501.3dh
szwauger/S_AUGER_123DI_31601.3dh

-----------------------------------------------------------------------

2D Project szwauger_1p

-----------------------------------------------------------------------


Desired output :

file 1, named "3D Survey MBST_BASIN"

3D Survey MBST_BASIN/M93upd05_htti2_TTIvol2_Z (storage m93up5)
No errors found


file 2, named "3D Survey m93up5_ip"

3D Survey m93up5_ip/M93upd05_htti2_TTIvol2_Z (storage m93up5)
No errors found


file 3, named "3D Survey MARS_B"

3D Survey MARS_B/Mars-B (storage mars_b)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

mars_b/mars_b01.3dv


and so on...
# 2  
Old 05-11-2009
if you have Python, here's an alternative
Code:
f=0
for line in open("file"):
    line=line.strip()
    if "---" in line:continue
    elif "3D Survey" in line:
        filename=line.split("/")[0]
        o=open(filename.replace(" ","."),"w")
        f=1
    if f:print >>o, line

output:
Code:
# ls -1 3D*
3D.Survey.AUGER_123DI
3D.Survey.MARS_B
3D.Survey.MBST_BASIN
3D.Survey.m93up5_ip
3D.Survey.mars_b_ip

# more 3D.Survey.mars_b_ip
3D Survey mars_b_ip/Mars-B (storage mars_b)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

mars_b/mars_b01.3dv

# more 3D.Survey.MARS_B
3D Survey MARS_B/Mars-B (storage mars_b)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

mars_b/mars_b01.3dv

# 3  
Old 05-11-2009
Thanks - although I don't know what Python is, I might be able to adapt your code in a shell script.
# 4  
Old 05-11-2009
Quote:
Originally Posted by cajunfries
Thanks - although I don't know what Python is, I might be able to adapt your code in a shell script.
Python is a scripting/programming language (much like Perl). anyway, that Python code is self explanatory so i don't think you will have much problems "converting" it to shell.
# 5  
Old 05-11-2009
Hi,
Thanks again, but just briefly skimming over your code, I see one small problem which will prevent it from working like I need - the line "elif "3D Survey" in line:" cannot be so specific for that text string.

I'll need some way to capture any
and all text between the dashed lines, then use whatever comes up in the first line of text (before the slash) as the output filename. It's not always going to write that text "3D Survey" in the first text line - it might be anything. It's just an accident that my example showed that seemed to be consistent (sorry).
It changes later on in the file...

basically I need something like this

open file for reading
if dashed lines, then skip one line
read next line - extract text up to slash and store as filename
read lines and write to file
if dashed line encountered, close file
repeat

I'll have another look at the code later on today and see what I can figure out.
# 6  
Old 05-11-2009
another way, if your file is not too big, is to get everything into memory, then do a split on dashes+newline. after splitting, array will contain all the data the need. iterate the array to get the filenames, and write to output file accordingly.
Code:
import re
pat=re.compile("--*\n",re.M|re.DOTALL) #going to split the whole file by dash followed by \n
data=open("file").read()
data=pat.split(data)
data=[i.strip() for i in data if i != "" ] #remove extraneous data like blanks , newlines
for items in data:
    try:
        index_of_slash = items.index("/") #get the position where "/" is
    except:
        pass
    else:
        filename = items[:index_of_slash] #construct filename
        open(filename.replace(" ","."),"w").write(items)

output:
Code:
# ls -1 3D*
3D.Survey.AUGER_123DI
3D.Survey.MARS_B
3D.Survey.MBST_BASIN
3D.Survey.m93up5_ip
3D.Survey.mars_b_ip

# more 3D.Survey.AUGER_123DI
3D Survey AUGER_123DI/szwauger (storage szwauger)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

szwauger/S_AUGER_123DI_30601.3dh
szwauger/S_AUGER_123DI_30701.3dh
szwauger/S_AUGER_123DI_30801.3dh
szwauger/S_AUGER_123DI_30901.3dh
szwauger/S_AUGER_123DI_31001.3dh
szwauger/S_AUGER_123DI_31101.3dh
szwauger/S_AUGER_123DI_31201.3dh
szwauger/S_AUGER_123DI_31301.3dh
szwauger/S_AUGER_123DI_31401.3dh
szwauger/S_AUGER_123DI_31501.3dh
szwauger/S_AUGER_123DI_31601.3dh

with the shell, you can use awk to get the same results....(incomplete code)
Code:
awk 'BEGIN{
 RS="---*\n\n"
 FS="/"
}{
 filename=$1
 if(filename !=""){
    print $0 >filename
 } 
}' file


Last edited by ghostdog74; 05-11-2009 at 01:50 PM..
# 7  
Old 05-11-2009
OK - thanks!
I'll sift thru all this and see what I can do. I imagine the awk might be the best solution. Don't know arrays at all.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

2. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

3. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

4. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next... (59 Replies)
Discussion started by: Ophiuchus
59 Replies

5. Shell Programming and Scripting

Working with individual blocks of text using awk

Hi, I am working with CVS log data and have some data as follows. RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v head: 1.14 branch: locks: strict access list: keyword substitution: o total revisions: 15; selected... (3 Replies)
Discussion started by: sandeepk1611
3 Replies

6. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

7. Shell Programming and Scripting

How to read text in blocks

Hi, I have file which contains information written in blocks (every block is different). Is it possible to read every block one by one to another file (one block per file). The input is something like this <block1> <empty line> <block2> <empty line> ... ... ... <block25> <empty... (0 Replies)
Discussion started by: art84_)LV
0 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies

10. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies
Login or Register to Ask a Question