SED extract XML value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting SED extract XML value
# 1  
Old 08-14-2009
SED extract XML value

I have the following string:

<min-pool-size>2</min-pool-size>

When I pipe the string into the following code I am expcting for it to return just the value "2", but its just reurning the whole string. Why??

Code:
sed -n '/<min-pool-size>/,/<\/min-pool-size>/p'

Outputting: <min-pool-size>2</min-pool-size>
# 2  
Old 08-14-2009
Code:
echo "<min-pool-size>2</min-pool-size>" | sed 's/\(.*\)\([0-9]\)\(.*\)/\2/

# 3  
Old 08-14-2009
Panyam, that works but I plan to replace "min-pool-size" with a variable and I don't think your method would work for that.
# 4  
Old 08-14-2009
Quote:
Originally Posted by ArterialTool
Panyam, that works but I plan to replace "min-pool-size" with a variable and I don't think your method would work for that.
So do you want to:
(a) extract the number 2 (data between tags) ? or
(b) replace "min-pool-size" with a variable ?

The poster's solution was correct for the problem you posed. It's a tad difficult for us to peer into your mind especially if your requirements change.

tyler_durden
# 5  
Old 08-14-2009
I want to extract the data between the tags, but the tags will be input as a variable.
# 6  
Old 08-14-2009
Quote:
Originally Posted by ArterialTool
but its just reurning the whole string. Why??
Code:
sed -n '/<min-pool-size>/,/<\/min-pool-size>/p'

The reason is simple: you should use some substitution, which "substitutes" the tag and the surrounding brackets to nothing, leaving only the text between the tags - "2".

What you have written is a so-called "range" command: you told sed to do something - the command "p" in your case - to a range of lines which are starting with "<min-pool-size>" and ending with "</min-pool-size>". So everything in the one line constituting the range is being printed.

Regarding youe problem: lets reformulate your problem in a more general way: You have text in the form

Code:
<tag>some text</tag>

where "tag" is some supplied text from outside. you want to filter out everything save for "some text". First, ask yourself if the text you want to preserve could span more than one line, because there will be extra effort necessary if this could be the case. Is the following possible?

Code:
<tag>some
more text</tag>

And if it is do you want to preserve line breaks as in the original or do you want some one-line stream to be the result?

Let us start with the simplest: only one-liners. The problem here, like in all the other cases, is to put the inherited variable content into the sed-regexp. Save the following to a file called "test1.sh", give it execute-rights and call it like "test1.sh /path/to/inputfile":

Code:
typeset fIn="$1"
typeset tag="min-pool-size"
sed 's/<'"${tag}"'>\(.*\)<\/'"$tag"'>/\1/' $fIn

This will work in your example, but it will fail in my second variant. Notice that i quoted cautiously all the variables to be sure to end up with a continuous string as regexp after the shell is done expanding the variables.

In the next version we will take care of multiline contents but will let them remain multiline. We have three sorts of lines to deal with:

Code:
1) "some content<tag>content-to-preserve"
2) "lines between start- and end-tag"
3) "content-to-preserve</tag>some more content"

We will use the range facility, like you did, but in a somewhat more complicated fashion:

Code:
typeset fIn="$1"
typeset tag="min-pool-size"
sed -n '/<'"${tag}"'>/,/<\/'"$tag"'>/ {
               s/^.*<'"$tag"'>//
               s/<\/'"$tag"'>.*$//
               p
          }' $fIn

With the "-n" all output of sed is suppressed. This way we filter out all irrelevant text before and after our tags. In the first line of the regexp we declare the range - the first line where the start-tag appears, the last line where the end-tag appears and all lines in between. To these lines we apply all the commands in the curly braces one by one. Then we cut out everything before the start-tag including it itself in the first line (taking care of the type-1 lines above), everything after the end-tag (taking care of the type-3-lines above) and the type-2-lines are untouched. The last command, "p", prints everything inside the range left over from the cuttings - voila!

How to transform this output to a one-line stream is left as an exercize to the interested reader who by now should be eager to try his newly found insight in the workings of sed on a problem of his own. ;-))

I hope this helps.

bakunin
# 7  
Old 08-14-2009
Bakunin, thank you so much for the explanation you gave. Fortunately in this instance everything is on one line, so your first sed function will work. Your second sed function will definitely come in handy and could be the bases for very versatile XML parsing function. Thank you very much for your contributions.

Last edited by ArterialTool; 08-14-2009 at 04:52 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract a value from an xml file

I have this XML file format and all in one line: Fri Dec 23 00:14:52 2016 Logged Message:689|<?xml version="1.0" encoding="UTF-8"?><PORT_RESPONSE><HEADER><ORIGINATOR>XMG</ORIGINATOR><DESTINAT... (16 Replies)
Discussion started by: mrn6430
16 Replies

2. Shell Programming and Scripting

Extract strings from XML files and create a new XML

Hello everybody, I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings.. The original XML files contain the source code for... (12 Replies)
Discussion started by: milano.churchil
12 Replies

3. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

4. Shell Programming and Scripting

sed - extract text from xml file

hi, please help, i have an xml file, e.g: ... <tag> test text asdas="${abc}" xvxvbs:asdas${222}sdad asasa="${aa_bb_22}" </tag> ... i want to extract all "${...}", e.g: ${abc} ${222} ${aa_bb_22} thank you. (2 Replies)
Discussion started by: gioni
2 Replies

5. Shell Programming and Scripting

Extract Multivalue from XML

I have below attached xml file , how can I have my desired output as below. i/p file <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:executeMDXResponse... (4 Replies)
Discussion started by: manas_ranjan
4 Replies

6. Shell Programming and Scripting

Extract value from XML

I have a file like below <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:executeMDXResponse... (9 Replies)
Discussion started by: manas_ranjan
9 Replies

7. Shell Programming and Scripting

sed extract from xml

I have an xml file that generally looks like this: "<row><dnorpattern>02788920</dnorpattern><description/></row><row><dnorpattern>\+ 44146322XXXX</dnorpattern><description/></row><row><dnorpattern>40XXX</dnorpattern><description/></row><row><dnorpattern>11</dn... (4 Replies)
Discussion started by: garboon
4 Replies

8. Shell Programming and Scripting

Using SED/AWK to extract xml at end of file

Hello everyone, Firstly i do not require alot of help.. i am right at the end of finishing my scipt but cannot find a solution to the last part. What i need to do is, prompt the user for a file to work with, which i have done. promt the user for an output file - which is done. #!/bin/bash... (14 Replies)
Discussion started by: hugh86
14 Replies

9. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies

10. Shell Programming and Scripting

· simerian · XML Extract

The script following in this thread allows XML data to be located and extracted in a variety of forms from an XML data stream. Using this utility, it is possible to extract all manner of XML subsets and allow data to be post inserted into the "original" XML at any logical point. The pipe is... (2 Replies)
Discussion started by: Simerian
2 Replies
Login or Register to Ask a Question