extract multiple sections of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract multiple sections of file
# 1  
Old 03-18-2008
extract multiple sections of file

I have a file that I need to parse multiple sections from the file.

The file contains multiple lines that start with ST (Abunch of data)
Then the file contains multiple lines that start with SE (Abunch of data)

SE*30*0001
ST*810*0002

I need all of the lines between and including these.
They are invoices.
The invoice starts with the ST line and ends with the SE line.

I need to break out all of the invoices into separate files.

Can someone please help me. I know Grep, sed, or AWK can do this, but not sure how.
Thank you


Here is an example:
ST*810*0001
BIG*20080315*1220680417**SUPPLY***DI
N1*SF*MCLANE HIGH PLAINS*92*46120004
N1*ST*SWC 7-11 #57134*91*571315
N3*2712 E 8TH ST
N4*ODESSA*TX*79761
REF*ST*000134
ITD*05*3*****7*****NET 7
IT1**1*CA*20.09**CB*649251*PI*093*UP*099299711018*RA*NA
TXI*ZZ*1.53****2
CTP**RES*0***CSR*1
PID*F****7-11 T-SHIRT BAG 1/7 BBL
PO4*1000
IT1**1*EA*33.72**CB*834861*PI*093*UP*012253022401*RA*NA
TXI*ZZ*2.57****2
CTP**RES*0***CSR*1
PID*F****KIT CONCRETE CHAMP
PO4*1
IT1**1*EA*0.03**CB*192849*PI*093*UP*000000192842*RA*NA
CTP**RES*0***CSR*1
PID*F****SCS 711 BK 200
PO4*1
IT1**30*EA*2.59**CB*001511*PI*093*UP*025215102776*RA*NA
CTP**RES*0***CSR*1
PID*F****MAXELL T-160 PLUS VIDEO
PO4*1
TDS*18454
SAC*C*G740***5300*******06***SERVICE
CTT*4
SE*30*0001
# 2  
Old 03-18-2008
Code:
awk '/^ST/,/^SE/' file

# 3  
Old 03-18-2008
Thank you for your prompt response.

It did what I wanted. However the three sections need to be parsed to to different files.

So you have
ST
data
SE
This should be taken to file 1
ST
data
SE
This should be taken to file 2

ETC.....

Also I noticed that the ST and SE are numbered.

ST*810*0004
Then
SE*(Number)*0004
Thank you

Last edited by rgentis; 03-18-2008 at 09:07 PM.. Reason: Added something
# 4  
Old 03-18-2008
nawk 'BEGIN{n=1}
$0 ~ /^ST/ {f=1}
$0 ~ /^SE/ {invoice[n]=sprintf("%s\n%s",invoice[n],$0);f=0;n=n+1}
{
if (f==1)
invoice[n]=sprintf("%s\n%s",invoice[n],$0)
}
END{
for (i in invoice)
print invoice[i] >> i
close(i)
}' filename
# 5  
Old 03-19-2008
Hi.

An alternate awk solution:
Code:
#!/usr/bin/env sh

# @(#) s1       Demonstrate extraction of range to separate files.

#  ____
# /
# |   Infrastructure BEGIN

echo
set -o nounset

debug=":"
debug="echo"

## The shebang using "env" line is designed for portability. For
#  higher security, use:
#
#  #!/bin/sh -

## Use local command version for the commands in this demonstration.

set +o nounset
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) awk my-nl
set -o nounset

# Use nawk or /usr/xpg4/bin/awk on Solaris.

echo

FILE=${1-data1}
echo " Input file $FILE:"
cat $FILE

# |   Infrastructure END
# \
#  ---

echo
echo " Results from processing:"
awk '
BEGIN   { i = 0 }
/ST/            { i++ ; name = "file" i }
/ST/,/SE/       { print > name }
' $FILE

my-nl file?

exit 0

Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU Awk 3.1.4
my-nl (local) 296

 Input file data1:
ST
first invoice
SE
ST
second invoice
SE
ST
third invoice
SE

 Results from processing:

==> file1 <==

  1 ST
  2 first invoice
  3 SE

==> file2 <==

  1 ST
  2 second invoice
  3 SE

==> file3 <==

  1 ST
  2 third invoice
  3 SE

Choose the base file name you wish in variable "name" ... cheers, drl
# 6  
Old 03-19-2008
extract multiple sections of file

#-- Use ST values as output filename.
awk -v out="/dev/null" '
/^ST/ {gsub("\\*","-",$0); out=$0".txt"}
/^SE/ { close(out) }
{ printf "%s\n",$0 >> out }
' $INFILE

Output will be
ST-810-0001.txt
so on ...

-Ramesh
# 7  
Old 03-25-2008
I wanted to thank all of you for your response.

One issue, I am porting the awk utility to windows. So I do not think all of the functionality is there.
For instance when I used Ramesh's example, I received numerous errors.
Here is the code:

c:\tools\gnuwin32\bin\awk -v '/^ST/ {gsub("\\*","-",$0); out=$0".txt"}
/^SE/ { close(out) }
{ printf "%s\n",$0 >> out }
' %input%edifile.dat

Here is the result:
awk: `/ST/' argument to `-v' not in `var=value' form

Usage: awk [POSIX or GNU style options] -f progfile [--] file .
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options:
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
-m[fr] val
-W compat --compat
-W copyleft --copyleft
-W copyright --copyright
-W dump-variables[=file] --dump-variables[=file]
-W exec=file --exec=file
-W gen-po --gen-po
-W help --help
-W lint[=fatal] --lint[=fatal]
-W lint-old --lint-old
-W non-decimal-data --non-decimal-data
-W profile[=file] --profile[=file]
-W posix --posix
-W re-interval --re-interval
-W source=program-text --source=program-text
-W traditional --traditional
-W usage --usage
-W use-lc-numeric --use-lc-numeric
-W version --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
gawk '{ sum += $1 }; END { print sum }' file
gawk -F: '{ print $1 }' /etc/passwd
'/SE/' is not recognized as an internal or external command,
operable program or batch file.
'{' is not recognized as an internal or external command,
operable program or batch file.
''' is not recognized as an internal or external command,
operable program or batch file.
C:\tools>edi
awk: `'/ST/' argument to `-v' not in `var=value' form

Usage: awk [POSIX or GNU style options] -f progfile [--] file .
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options:
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
-m[fr] val
-W compat --compat
-W copyleft --copyleft
-W copyright --copyright
-W dump-variables[=file] --dump-variables[=file]
-W exec=file --exec=file
-W gen-po --gen-po
-W help --help
-W lint[=fatal] --lint[=fatal]
-W lint-old --lint-old
-W non-decimal-data --non-decimal-data
-W profile[=file] --profile[=file]
-W posix --posix
-W re-interval --re-interval
-W source=program-text --source=program-text
-W traditional --traditional
-W usage --usage
-W use-lc-numeric --use-lc-numeric
-W version --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
gawk '{ sum += $1 }; END { print sum }' file
gawk -F: '{ print $1 }' /etc/passwd
'/SE/' is not recognized as an internal or external command,
operable program or batch file.
'{' is not recognized as an internal or external command,
operable program or batch file.
''' is not recognized as an internal or external command,
operable program or batch file.


Thank you again for your help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract certain sections of a line

I have a log that looks like below sc.mng_10_Err.20131020_000000.log:NCSSC_MNG_UP_PE_TO_BE : Failed to change dvc_trx_sts from PE to BE for srvtrx: 213323141427349 dvcsfx: 1 sc.mng_4_Err.20131020_000000.log:NCSSC_MNG_UP_PE_TO_BE : Failed to change dvc_trx_sts from PE to BE for srvtrx:... (6 Replies)
Discussion started by: senormarquez
6 Replies

2. Shell Programming and Scripting

Extract a pattern from multiple lines in a file

I have a file that has some lines starts with * I want to get these lines, then get the word between "diac" and "lex". ex. file: ;;WORD AlAx *0.942490 diac:Al>ax lex:>ax_1 bw:Al/DET+>ax/NOUN+ gloss:brother pos:noun prc3:0 prc2:0 prc1:0 prc0:Al_det per:na asp:na vox:na mod:na gen:m num:s... (4 Replies)
Discussion started by: Viernes
4 Replies

3. Shell Programming and Scripting

Extract strings from multiple lines into one csv file

Hi all, Please go through my requirement. I have a log file in the location /opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch This file contains the follwing pattern data <af type="tenured" id="42" timestamp="May 14 13:44:13 2011" intervalms="955.624"> <minimum... (8 Replies)
Discussion started by: satish.vampire
8 Replies

4. Shell Programming and Scripting

Extract strings from multiple lines into one file -

input file Desired csv output gc_type, date/time, milli secs af, Mar 17 13:09:04 2011, 144.596 af, Mar 20 00:37:37 2011, 144.242 af, ar 20 21:30:59 2011, 108.518 Hi All, Any help in acheiving the above would be appreciated. I would like to parse through lines within one file and... (5 Replies)
Discussion started by: satish.vampire
5 Replies

5. Programming

extract different sections of a file

Hi All, I have a file with the data 10;20;30;40;50;60;70;80;123;145;156;345. the output i want is the first fourth sixth elements and everything from there on. How do i achieve this. (1 Reply)
Discussion started by: raghu_shekar
1 Replies

6. UNIX for Dummies Questions & Answers

Help please, extract multiple lines from a text file

Hi all, I need to extract lines between the lines 'RD' and 'QA' from a text file (following). there are more that one of such pattern in the file and I need to extract all of them. however, the number of lines between them is varied in the file. Therefore, I can not just use 'grep -A' command.... (6 Replies)
Discussion started by: johnshembb
6 Replies

7. Shell Programming and Scripting

How to edit file sections that cross multiple lines?

Hello, I'm wondering where I could go to learn how to edit file sections that cross multiple lines. I'm wanting to write scripts that will add Gnome menu entries for all users on a system for scripts I write, etc. I can search an replace simple examples with sed, but this seems more complex. ... (8 Replies)
Discussion started by: Narnie
8 Replies

8. Shell Programming and Scripting

Extract multiple repeated data from a text file

Hi, I need to extract data from a text file in which data has a pattern. I need to extract all repeated pattern and then save it to different files. example: input is: ST*867*000352214 BPT*00*1000352214*090311 SE*1*1 ST*867*000352215 BPT*00*1000352214*090311 SE*1*2 ... (5 Replies)
Discussion started by: apjneeraj
5 Replies

9. UNIX for Advanced & Expert Users

extract multiple sections of a file

I have a file that I need to parse multiple sections from the file. The file contains multiple lines that start with ST (Abunch of data) Then the file contains multiple lines that start with SE (Abunch of data) SE*30*0001 ... (1 Reply)
Discussion started by: rgentis
1 Replies

10. Shell Programming and Scripting

Handle Configuration File with same name of Parameter in multiple Sections

Hi I have a config file with multiple section and a parameter with the same name in each section. I need to read each parameter for distinct section. Parameter = 1 .... Parameter = 2 .... Parameter = 4 .... Tried this: grep -m1 '^*ProcessorsNumber' ServiceBrokerFramework.cfg |... (7 Replies)
Discussion started by: potro
7 Replies
Login or Register to Ask a Question