Sponsored Content
Top Forums Shell Programming and Scripting extract blocks of text from a file Post 302315088 by ghostdog74 on Monday 11th of May 2009 12:44:56 PM
Old 05-11-2009
another way, if your file is not too big, is to get everything into memory, then do a split on dashes+newline. after splitting, array will contain all the data the need. iterate the array to get the filenames, and write to output file accordingly.
Code:
import re
pat=re.compile("--*\n",re.M|re.DOTALL) #going to split the whole file by dash followed by \n
data=open("file").read()
data=pat.split(data)
data=[i.strip() for i in data if i != "" ] #remove extraneous data like blanks , newlines
for items in data:
    try:
        index_of_slash = items.index("/") #get the position where "/" is
    except:
        pass
    else:
        filename = items[:index_of_slash] #construct filename
        open(filename.replace(" ","."),"w").write(items)

output:
Code:
# ls -1 3D*
3D.Survey.AUGER_123DI
3D.Survey.MARS_B
3D.Survey.MBST_BASIN
3D.Survey.m93up5_ip
3D.Survey.mars_b_ip

# more 3D.Survey.AUGER_123DI
3D Survey AUGER_123DI/szwauger (storage szwauger)
Seismic files referenced in Oracle not present on disk
This is an ERROR. Files listed below will not open in SeisWorks:

szwauger/S_AUGER_123DI_30601.3dh
szwauger/S_AUGER_123DI_30701.3dh
szwauger/S_AUGER_123DI_30801.3dh
szwauger/S_AUGER_123DI_30901.3dh
szwauger/S_AUGER_123DI_31001.3dh
szwauger/S_AUGER_123DI_31101.3dh
szwauger/S_AUGER_123DI_31201.3dh
szwauger/S_AUGER_123DI_31301.3dh
szwauger/S_AUGER_123DI_31401.3dh
szwauger/S_AUGER_123DI_31501.3dh
szwauger/S_AUGER_123DI_31601.3dh

with the shell, you can use awk to get the same results....(incomplete code)
Code:
awk 'BEGIN{
 RS="---*\n\n"
 FS="/"
}{
 filename=$1
 if(filename !=""){
    print $0 >filename
 } 
}' file


Last edited by ghostdog74; 05-11-2009 at 01:50 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies

2. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies

3. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

4. Shell Programming and Scripting

How to read text in blocks

Hi, I have file which contains information written in blocks (every block is different). Is it possible to read every block one by one to another file (one block per file). The input is something like this <block1> <empty line> <block2> <empty line> ... ... ... <block25> <empty... (0 Replies)
Discussion started by: art84_)LV
0 Replies

5. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

6. Shell Programming and Scripting

Working with individual blocks of text using awk

Hi, I am working with CVS log data and have some data as follows. RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v head: 1.14 branch: locks: strict access list: keyword substitution: o total revisions: 15; selected... (3 Replies)
Discussion started by: sandeepk1611
3 Replies

7. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next... (59 Replies)
Discussion started by: Ophiuchus
59 Replies

8. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

9. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

10. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies
ICUPKG(8)							 ICU 50.1.2 Manual							 ICUPKG(8)

NAME
icupkg - extract or modify an ICU .dat archive SYNOPSIS
icupkg [ -h, -?, --help ] [ -tl, --type l | -tb, --type b | -te, --type e ] [ -c, --copyright | -C, --comment comment ] [ -a, --add list ] [ -r, --remove list ] [ -x, --extract list ] [ -l, --list ] [ -s, --sourcedir source ] [ -d, --destdir destination ] [ -w, --writepkg ] [ -m, --matchmode mode ] infilename [ outfilename ] DESCRIPTION
icupkg reads the input ICU .dat package file, modify it according to the options, swap it to the desired platform properties (charset & endianness), and optionally write the resulting ICU .dat package to the output file. Items are removed, then added, then extracted and listed. An ICU .dat package is written if items are removed or added, or if the input and output filenames differ, or if the -w, --writepkg option is set. If the input filename is "new" then an empty package is created. If the output filename is missing, then it is automatically generated from the input filename. If the input filename ends with an l, b, or e matching its platform properties, then the output filename will con- tain the letter from the -t, --type option. This tool can also be used to just swap a single ICU data file, replacing the former icuswap tool. For this mode, provide the infilename (and optional outfilename) for a non-package ICU data file. Allowed options include -t, -w, -s and -d The filenames can be absolute, or relative to the source/dest dir paths. Other options are not allowed in this mode. OPTIONS
-h, -?, --help Print help about usage and exit. -tl, --type l Output for little-endian/ASCII charset family. The output type defaults to the input type. -tb, --type b Output for big-endian/ASCII charset family. The output type defaults to the input type. -te, --type e Output for big-endian/EBCDIC charset family. The output type defaults to the input type. -c,--copyright Include the ICU copyright notice in the resulting data. -C, --comment comment Include the specified comment in the resulting data instead of the ICU copyright notice. -a, --add list Add items from the list to the package. The list can be a single filename with a .txt file extension containing a list of item file- names, or an ICU .dat package filename. -r, --remove list Remove items from the list from the package. The list can be a single filename with a .txt file extension containing a list of item filenames, or an ICU .dat package filename. -x, --extract list Extract items from the list from the package. The list can be a single filename with a .txt file extension containing a list of item filenames, or an ICU .dat package filename. -m, --matchmode mode Set the matching mode for item names with wildcards. -s, --sourcedir source Set the source directory to source. The default source directory is the current directory. -d, --destdir destination Set the destination directory to destination. The default destination directory is the current directory. -l, --list List the package items to stdout (after modifying the package). LIST FILE SYNTAX
Items are listed on one or more lines and separated by whitespace (space+tab). Comments begin with # and are ignored. Empty lines are ignored. Lines where the first non-whitespace character is one of "%&'()*+,-./:;<=>?_ are also ignored to reserve for future syntax. Items for removal or extraction may contain a single * wildcard character. The * matches zero or more characters. If -m, --matchmode noslash is set, then the * character does not match the / character. Items must be listed relative to the package, and the --sourcedir or the --destdir path will be prepended. The paths are only prepended to item filenames while adding or extracting items, not to ICU .dat package or list filenames. Paths may contain / instead of the platform's file separator character and are converted as appropriate. AUTHORS
Markus Scherer George Rhoten VERSION
1.0 COPYRIGHT
Copyright (C) 2006 IBM, Inc. and others. SEE ALSO
pkgdata(1) genrb(1) ICU MANPAGE
18 August 2006 ICUPKG(8)
All times are GMT -4. The time now is 02:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy