Sponsored Content
Top Forums Shell Programming and Scripting sed filtering lines by range fails 1-line-ranges Post 302724111 by bakunin on Wednesday 31st of October 2012 08:56:10 AM
Old 10-31-2012
Many thanks for your helpful suggestions.

I modified the function a bit and noticed, that i don't need the last step "pStripTags" if i modify the sed-script to strip the tags immediately. Here is the revised function. I have added "tee -a <tracefile>" commands to control the various steps of the recursion. For production they can safely be removed as they only serve debugging purposes:

Code:
# ------------------------------------------------------------------------------
# pGetXML                        extract certain values from a layered XML code
# ------------------------------------------------------------------------------
# Author.....: bakunin, with help of various unix.com members
# last update: 2012 08 23    by: bakunin
# ------------------------------------------------------------------------------
# Revision Log:
#
# ------------------------------------------------------------------------------
# Usage:
#     pGetXML tag1[/option1] [tag2[/option2] ..]
#
#
#     Example:
#          cat file | pGetXML foo/opt1 bar/opt2
#          will search for a range of "<foo ...opt1..> ... </foo>" and in the
#          resulting stream search for a range of "<bar ..opt2..> ... </bar>
#          The result will be reformatted to a single line and the enclosing
#          tags will be removed. This text:
#
#          <foo type=opt2>
#               <sometag>
#          </foo>
#          <foo type=opt1>
#               <bar>
#                    somevalue
#               </bar
#               <bar type=opt2>searched_for</bar>
#          </foo>
#
#          will result only in "searched_for", because in the first foo-tag the
#          option doesn't match, the same goes for the first bar-tag 
#
# Prerequisites:
# - none
# ------------------------------------------------------------------------------
# Documentation:
# Extracts values from an XML file of nested tags presented at <stdin>.
# The given list of tags is searched recursively. Only the tag name has to
# be given, so
#
#             pGetXML foo
#
# will return the content of "<foo> .. </foo>". It is possible to refine tags
# by using "options", which will be searched for in the tag definition (see
#  below).
#
# Output goes to <stdout>.
#
#     Parameters: tag1[/opt1] [tag2[/opt2] ..tagN[/optN]] 
#     returns: void
# ------------------------------------------------------------------------------
# known bugs:
#
#     none
# ------------------------------------------------------------------------------
# ..........................(C) 2012 bakunin ..................................
# ------------------------------------------------------------------------------

function pGetXML
{
typeset chTag="$1"
typeset chOpt="$1"
typeset chLine=""

if [ "${chOpt#*/}" = "${chOpt}" ] ; then
     chOpt=""
else
     chOpt="${chOpt#*/}"
     chTag="${chTag%/*}"
fi

# DEBUG start
#      print -u2 - "inside pGetXML...."
#      print -u2 - "chTag=${chTag}"
#      print -u2 - "chOpt=${chOpt}"
#      print -u2 - "Args=$*\n"
# DEBUG end

if [ -n "$chTag" ] ; then
     shift
     sed -n '/<'"$chTag"'[^>]*'"$chOpt"'[^>]*>/ {
               :next
               /<\/'"$chTag"'[^>]*>/! {
                    N
                    b next
               }
             }
             /<\/'"$chTag"'[^>]*>/ {
               s/\n//g
               s/^.*<'"$chTag"'[^>]*'"$chOpt"'[^>]*>//
               s/<\/'"$chTag"'[^>]*>.*$//p
             }' |\
     tee -a xxx.$(date +'%H%M%N').out |\
     pGetXML $*
else
     tee -a xxx.last.out |\
     while read chLine ; do
          print - "$chLine"
     done
fi

return 0
}

bakunin
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove a range of lines from a file using sed

Hi I am having some issue editing a file in sed. What I want to do is, in a loop pass a variable to a sed command. Sed should then search a file for a line that matches that variable, then remove all lines below until it reaches a line starting with a constant. I have managed to write a... (14 Replies)
Discussion started by: Andy82
14 Replies

2. Shell Programming and Scripting

Sed print range of lines between line number and pattern

Hi, I have a file as below This is the line one This is the line two <\XMLTAG> This is the line three This is the line four <\XMLTAG> Output of the SED command need to be as below. This is the line one This is the line two <\XMLTAG> Please do the need to needful to... (4 Replies)
Discussion started by: RMN
4 Replies

3. Shell Programming and Scripting

Generate Regex numeric range with specific sub-ranges

hi all, Say i have a range like 0 - 1000 and i need to split into diffrent files the lines which are within a specific fixed sub-range. I can achieve this manually but is not scalable if the range increase. E.g cat file1.txt Response time 2 ms Response time 15 ms Response time 101... (12 Replies)
Discussion started by: varu0612
12 Replies

4. Shell Programming and Scripting

Grep range of lines to print a line number on match

Hi Guru's, I am trying to grep a range of line numbers (based on match) and then look for another match which starts with a special character '$' and print the line number. I have the below code but it is actually printing the line number counting starting from the first line of the range i am... (15 Replies)
Discussion started by: Kevin Tivoli
15 Replies

5. Shell Programming and Scripting

Awk/sed : help on:Filtering multiple lines to one:

Experts Good day, I want to filter multiple lines of same error of same day , to only 1 error of each day, the first line from the log. Here is the file: May 26 11:29:19 cmihpx02 vmunix: NFS write failed for server cmiauxe1: error 5 (RPC: Timed out) May 26 11:29:19 cmihpx02 vmunix: NFS... (4 Replies)
Discussion started by: rveri
4 Replies

6. Shell Programming and Scripting

sed pattern fails to delete line of numbers

We are using Red Hat Linux. I have a flat file with among other things, the following lines, which appear occasionally throughout the file: Using sed, I delete this line: L;L;L;L;R;R;R;L;R;L;R;R;R;L;L;L With: /^;;;;;*/d Works fine every time. However, I cannot delete... (6 Replies)
Discussion started by: bloomlock
6 Replies

7. Shell Programming and Scripting

sed replace range of characters in each line

Hi, I'm trying to replace a range of characters by their position in each line by spaces. I need to replace characters 95 to 145 by spaces in each line. i tried below but it doesn't work sed -r "s/^(.{94})(.{51})/\ /" inputfile.txt > outputfile.txt can someone please help me... (3 Replies)
Discussion started by: Kevin Tivoli
3 Replies

8. Shell Programming and Scripting

sed variable expansion fails for substitution in range

I'm trying to change "F" to "G" in lines after the first one: 'FUE.SER' 5 1 1 F0501 F0401 F0502 2 1 F0301 E0501 F0201 E0502 F0302 3 1 F0503 E0503 E0301 E0201 E0302 E0504 F0504 4 1 F0402 F0202 E0202 F0101 E0203 F0203 F0403 5 1 F0505 E0505 E0303 E0204 E0304 E0506... (10 Replies)
Discussion started by: larrl
10 Replies

9. UNIX for Beginners Questions & Answers

Sed/awk to delete a regex between range of lines

Hi Guys I am looking for a solution to one problem to remove parentheses in a range of lines. Input file module bist_logic_inst(a, ab , dhd, dhdh , djdj, hdh, djjd, jdj, dhd, dhp, dk ); input a; input ab; input dhd; input djdj; input dhd; output hdh; output djjd; output jdj;... (5 Replies)
Discussion started by: kshitij
5 Replies

10. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608 Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276 Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544... (9 Replies)
Discussion started by: cryptodice
9 Replies
XML::Filter::SAXT(3)					User Contributed Perl Documentation				      XML::Filter::SAXT(3)

NAME
XML::Filter::SAXT - Replicates SAX events to several SAX event handlers SYNOPSIS
$saxt = new XML::Filter::SAXT ( { Handler => $out1 }, { DocumentHandler => $out2 }, { DTDHandler => $out3, Handler => $out4 } ); $perlsax = new XML::Parser::PerlSAX ( Handler => $saxt ); $perlsax->parse ( [OPTIONS] ); DESCRIPTION
SAXT is like the Unix 'tee' command in that it multiplexes the input stream to several output streams. In this case, the input stream is a PerlSAX event producer (like XML::Parser::PerlSAX) and the output streams are PerlSAX handlers or filters. The SAXT constructor takes a list of hash references. Each hash specifies an output handler. The hash keys can be: DocumentHandler, DTDHan- dler, EntityResolver or Handler, where Handler is a combination of the previous three and acts as the default handler. E.g. if Documen- tHandler is not specified, it will try to use Handler. EXAMPLE In this example we use XML::Parser::PerlSAX to parse an XML file and to invoke the PerlSAX callbacks of our SAXT object. The SAXT object then forwards the callbacks to XML::Checker, which will 'die' if it encounters an error, and to XML::Hqandler::BuildDOM, which will store the XML in an XML::DOM::Document. use XML::Parser::PerlSAX; use XML::Filter::SAXT; use XML::Handler::BuildDOM; use XML::Checker; my $checker = new XML::Checker; my $builder = new XML::Handler::BuildDOM (KeepCDATA => 1); my $tee = new XML::Filter::SAXT ( { Handler => $checker }, { Handler => $builder } ); my $parser = new XML::Parser::PerlSAX (Handler => $tee); eval { # This is how you set the error handler for XML::Checker local $XML::Checker::FAIL = &my_fail; my $dom_document = $parser->parsefile ("file.xml"); ... your code here ... }; if ($@) { # Either XML::Parser::PerlSAX threw an exception (bad XML) # or XML::Checker found an error and my_fail died. ... your error handling code here ... } # XML::Checker error handler sub my_fail { my $code = shift; die XML::Checker::error_string ($code, @_) if $code < 200; # warnings and info messages are >= 200 } CAVEATS
This is still alpha software. Package names and interfaces are subject to change. AUTHOR
Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>. perl v5.8.0 2000-02-11 XML::Filter::SAXT(3)
All times are GMT -4. The time now is 06:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy