The following is part of a larger project and sed is (right now) a given. I am working on a recursive Korn shell function to "peel off" XML tags from a larger text. Just for context i will show the complete function (not working right now) here:
Code:
function pGetXML
{
typeset chTag="$1"
typeset chOpt="$1"
typeset chLine=""
if [ "${chOpt#*/}" = "${chOpt}" ] ; then
chOpt=""
else
chOpt="${chOpt#*/}"
chTag="${chTag%/*}"
fi
print -u2 - "inside pGetXML...."
print -u2 - "chTag=${chTag}"
print -u2 - "chOpt=${chOpt}"
print -u2 - "Args=$*\n"
if [ -n "$chTag" ] ; then
shift
sed -n '/<'"$chTag"'[^>]*'"$chOpt"'[^>]*>/,/<\/'"$chTag"'[^>]*>/p' |\
pGetXML $*
else
while read chLine ; do
pStripTags "$chLine"
done
fi
return 0
}
The function should first print everything from "<arg1>" to "</arg1>" (the "option" is used because there could be other tags with the same name i am not interested in, like "<arg1 type=else>"), in the second instance filter from that only the lines "<arg2>...</arg2>" and in the third pass only the lines "<Value>...</Value>". The function "pStripTags" simply strips off the tags leaving the text inside.
Well, this is what was intended and it kind of works, but in the last step "sed" fails to do as expected when opening and closing tag of the range is on eht same line. I am at this stage down to this portion of the text (this is verified):
Code:
<arg2 type=opt2>
<Value>blabla</Value>
</arg2>
and the sed command (verified with "set -xv") is this:
Code:
sed -n '/<Value[^>]*[^>]*>/,/<\/Value[^>]*>/p'
I would have expected it to only print line 2, but it doesn't. Instead it prints line 2 and 3.
The objective is to create a sed script that will fit into the recursive function. Any pointers will be welcome.
Hi bakunin, you may replace your sed script with this:
Code:
sed -n '
:strt
/<'"$chTag"'[^>]*'"$chOpt"'[^>]*>/{
/<\/'"$chTag"'[^>]*>/{
p
d
}
N
b strt
}'
In case of a range of addresses, sed will find a line matching the first address and will not try to match the second address too at that line. The second address will be attempted to be matched on subsequent lines. Hence, the problem.
I modified the function a bit and noticed, that i don't need the last step "pStripTags" if i modify the sed-script to strip the tags immediately. Here is the revised function. I have added "tee -a <tracefile>" commands to control the various steps of the recursion. For production they can safely be removed as they only serve debugging purposes:
Code:
# ------------------------------------------------------------------------------
# pGetXML extract certain values from a layered XML code
# ------------------------------------------------------------------------------
# Author.....: bakunin, with help of various unix.com members
# last update: 2012 08 23 by: bakunin
# ------------------------------------------------------------------------------
# Revision Log:
#
# ------------------------------------------------------------------------------
# Usage:
# pGetXML tag1[/option1] [tag2[/option2] ..]
#
#
# Example:
# cat file | pGetXML foo/opt1 bar/opt2
# will search for a range of "<foo ...opt1..> ... </foo>" and in the
# resulting stream search for a range of "<bar ..opt2..> ... </bar>
# The result will be reformatted to a single line and the enclosing
# tags will be removed. This text:
#
# <foo type=opt2>
# <sometag>
# </foo>
# <foo type=opt1>
# <bar>
# somevalue
# </bar
# <bar type=opt2>searched_for</bar>
# </foo>
#
# will result only in "searched_for", because in the first foo-tag the
# option doesn't match, the same goes for the first bar-tag
#
# Prerequisites:
# - none
# ------------------------------------------------------------------------------
# Documentation:
# Extracts values from an XML file of nested tags presented at <stdin>.
# The given list of tags is searched recursively. Only the tag name has to
# be given, so
#
# pGetXML foo
#
# will return the content of "<foo> .. </foo>". It is possible to refine tags
# by using "options", which will be searched for in the tag definition (see
# below).
#
# Output goes to <stdout>.
#
# Parameters: tag1[/opt1] [tag2[/opt2] ..tagN[/optN]]
# returns: void
# ------------------------------------------------------------------------------
# known bugs:
#
# none
# ------------------------------------------------------------------------------
# ..........................(C) 2012 bakunin ..................................
# ------------------------------------------------------------------------------
function pGetXML
{
typeset chTag="$1"
typeset chOpt="$1"
typeset chLine=""
if [ "${chOpt#*/}" = "${chOpt}" ] ; then
chOpt=""
else
chOpt="${chOpt#*/}"
chTag="${chTag%/*}"
fi
# DEBUG start
# print -u2 - "inside pGetXML...."
# print -u2 - "chTag=${chTag}"
# print -u2 - "chOpt=${chOpt}"
# print -u2 - "Args=$*\n"
# DEBUG end
if [ -n "$chTag" ] ; then
shift
sed -n '/<'"$chTag"'[^>]*'"$chOpt"'[^>]*>/ {
:next
/<\/'"$chTag"'[^>]*>/! {
N
b next
}
}
/<\/'"$chTag"'[^>]*>/ {
s/\n//g
s/^.*<'"$chTag"'[^>]*'"$chOpt"'[^>]*>//
s/<\/'"$chTag"'[^>]*>.*$//p
}' |\
tee -a xxx.$(date +'%H%M%N').out |\
pGetXML $*
else
tee -a xxx.last.out |\
while read chLine ; do
print - "$chLine"
done
fi
return 0
}
Hi Guys
I am looking for a solution to one problem to remove parentheses in a range of lines.
Input file
module bist_logic_inst(a, ab , dhd, dhdh , djdj, hdh, djjd, jdj, dhd, dhp, dk
);
input a;
input ab;
input dhd;
input djdj;
input dhd;
output hdh;
output djjd;
output jdj;... (5 Replies)
Hi,
I'm trying to replace a range of characters by their position in each line by spaces.
I need to replace characters 95 to 145 by spaces in each line.
i tried below but it doesn't work
sed -r "s/^(.{94})(.{51})/\ /" inputfile.txt > outputfile.txt
can someone please help me... (3 Replies)
We are using Red Hat Linux.
I have a flat file with among other things, the following lines, which appear occasionally throughout the file:
Using sed, I delete this line:
L;L;L;L;R;R;R;L;R;L;R;R;R;L;L;L
With:
/^;;;;;*/d
Works fine every time.
However, I cannot delete... (6 Replies)
Experts Good day,
I want to filter multiple lines of same error of same day , to only 1 error of each day, the first line from the log.
Here is the file:
May 26 11:29:19 cmihpx02 vmunix: NFS write failed for server cmiauxe1: error 5 (RPC: Timed out)
May 26 11:29:19 cmihpx02 vmunix: NFS... (4 Replies)
Hi Guru's,
I am trying to grep a range of line numbers (based on match) and then look for another match which starts with a special character '$' and print the line number. I have the below code but it is actually printing the line number counting starting from the first line of the range i am... (15 Replies)
hi all,
Say i have a range like 0 - 1000 and i need to split into diffrent files the lines which are within a specific fixed sub-range. I can achieve this manually but is not scalable if the range increase.
E.g
cat file1.txt
Response time 2 ms
Response time 15 ms
Response time 101... (12 Replies)
Hi,
I have a file as below
This is the line one
This is the line two
<\XMLTAG>
This is the line three
This is the line four
<\XMLTAG>
Output of the SED command need to be as below.
This is the line one
This is the line two
<\XMLTAG>
Please do the need to needful to... (4 Replies)
Hi
I am having some issue editing a file in sed.
What I want to do is, in a loop pass a variable to a sed command. Sed should then search a file for a line that matches that variable, then remove all lines below until it reaches a line starting with a constant.
I have managed to write a... (14 Replies)