extract blocks of text from a file Post: 302315088

Sponsored Content

Top Forums Shell Programming and Scripting extract blocks of text from a file Post 302315088 by ghostdog74 on Monday 11th of May 2009 12:44:56 PM

05-11-2009

Registered User

another way, if your file is not too big, is to get everything into memory, then do a split on dashes+newline. after splitting, array will contain all the data the need. iterate the array to get the filenames, and write to output file accordingly.

Code:

import re
pat=re.compile("--*\n",re.M|re.DOTALL) #going to split the whole file by dash followed by \n
data=open("file").read()
data=pat.split(data)
data=[i.strip() for i in data if i != "" ] #remove extraneous data like blanks , newlines
for items in data:
    try:
        index_of_slash = items.index("/") #get the position where "/" is
    except:
        pass
    else:
        filename = items[:index_of_slash] #construct filename
        open(filename.replace(" ","."),"w").write(items)

output:

Code:

# ls -1 3D*
3D.Survey.AUGER_123DI
3D.Survey.MARS_B
3D.Survey.MBST_BASIN
3D.Survey.m93up5_ip
3D.Survey.mars_b_ip

# more 3D.Survey.AUGER_123DI
3D Survey AUGER_123DI/szwauger (storage szwauger)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

szwauger/S_AUGER_123DI_30601.3dh
szwauger/S_AUGER_123DI_30701.3dh
szwauger/S_AUGER_123DI_30801.3dh
szwauger/S_AUGER_123DI_30901.3dh
szwauger/S_AUGER_123DI_31001.3dh
szwauger/S_AUGER_123DI_31101.3dh
szwauger/S_AUGER_123DI_31201.3dh
szwauger/S_AUGER_123DI_31301.3dh
szwauger/S_AUGER_123DI_31401.3dh
szwauger/S_AUGER_123DI_31501.3dh
szwauger/S_AUGER_123DI_31601.3dh

with the shell, you can use awk to get the same results....(incomplete code)

Code:

awk 'BEGIN{
 RS="---*\n\n"
 FS="/"
}{
 filename=$1
 if(filename !=""){
    print $0 >filename
 } 
}' file

Last edited by ghostdog74; 05-11-2009 at 01:50 PM..

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah }

2. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass...

3. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat...

4. Shell Programming and Scripting

How to read text in blocks

Hi, I have file which contains information written in blocks (every block is different). Is it possible to read every block one by one to another file (one block per file). The input is something like this <block1> <empty line> <block2> <empty line> ... ... ... <block25> <empty...

5. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as...

6. Shell Programming and Scripting

Working with individual blocks of text using awk

Hi, I am working with CVS log data and have some data as follows. RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v head: 1.14 branch: locks: strict access list: keyword substitution: o total revisions: 15; selected...

7. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next...

8. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in...

9. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ...

10. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22...

LEARN ABOUT DEBIAN

llsearch

LLSEARCH(1)						      General Commands Manual						       LLSEARCH(1)

NAME

       llsearch - Search a GNIS file for place names within a given block of latitude/longitude

SYNOPSIS

       llsearch [-L] | [latitude_low longitude_low latitude_high longitude_high]

DESCRIPTION

       The U.S. Geological Survey supports sites on the Internet with Geographic Names Information System (GNIS) files.  These files contain lists
       of place names, complete with their latitude/longitude and other information.  There are separate files for each of the	U.S.  states,  and
       each  file  contains  many, many, many place names.  If you want to use this data with drawmap, it is useful to reduce the data to only the
       items that you need.  Llsearch lets you filter a GNIS file and winnow out only those place names that fall  within  the	latitude/longitude
       boundaries  that  you  specify.	 (You  may  want to specify boundaries that are a tiny bit larger than what you are interested in, so that
       numerical quantization doesn't eliminate locales that fall exactly on your boundaries.)

       Latitudes and longitudes are positive for north latitude and east longitude, and negative for south latitude and west longitude.   Llsearch
       expects	you  to enter them in decimal degrees.	(The latitudes and longitudes in the GNIS file are in degrees-minutes-seconds format, fol-
       lowed by 'N', 'S', 'E', or 'W'.	However, there are two available file formats, and one of the formats also contains  the  latitudes/longi-
       tudes in decimal degrees.)  Typical usage is as follows:

       gunzip -c california.gz | llsearch 33 -118 34 -117 > gnis_santa_ana_west

       If you enter the "-L" option, the program will print some license information and exit.

       Once  you  have	reduced  the  data to some subset of interest, you can search for particular items via the grep or perl commands, or other
       search commands, or you can simply edit the results with your favorite text editor.  Search commands are useful in reducing the sheer  vol-
       ume  of	data  to a more manageable size (by extracting, say, all mountain summits or all streams), but you will probably ultimately end up
       looking through the remaining data manually.  The individual records contain codes, such as "ppl" for populated places,	and  "summit"  for
       mountain tops, that can help you pick and choose.

       There  is  considerable redundancy in place names, and human intelligence is useful in sorting things out.  While I was writing drawmap and
       llsearch, I frequently gazed out my office window, where I could spot at least two, and possibly three Baldy  Mountains.   There  are  also
       quite  a  few Beaver Creeks, Bear Canyons, Saddle Buttes, and Springfields out there.  By taking a close look at the information associated
       with each place name, you can find the particular locations that interest you.

SEE ALSO

       drawmap(1)

								   Jul 24, 2001 						       LLSEARCH(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete blocks of lines from text file

Discussion started by: nrbhole

2. Programming

c program to extract text between two delimiters from some text file

Discussion started by: kukretiabhi13

3. Shell Programming and Scripting

Extract sequence blocks

Discussion started by: solli

4. Shell Programming and Scripting

How to read text in blocks

Discussion started by: art84_)LV

5. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Discussion started by: paramad

6. Shell Programming and Scripting

Working with individual blocks of text using awk

Discussion started by: sandeepk1611

7. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Discussion started by: Ophiuchus

8. Shell Programming and Scripting

Adding and removing blocks of text from file

Discussion started by: maverick72

9. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

Discussion started by: Bashingaway

10. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Discussion started by: mahasona

LEARN ABOUT DEBIAN

llsearch