How can I extract XML block around matching search string?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How can I extract XML block around matching search string?
# 8  
Old 02-13-2016
Aia, sorry for the confusion. Corrected input from initial post.
Ideally I want to select

all lines from XML until </notify-list> & <application> XML block matching search string say myapp1-ear.
# 9  
Old 02-13-2016
Hi,
Also try
Code:
awk '/<application-name>myapp1-ear/ {print "\t<application>";c=1; print;next} c {print} /<\/application>/{ c=0}' xmlfile.xml


Last edited by looney; 02-13-2016 at 02:29 AM..
# 10  
Old 02-13-2016
Quote:
Originally Posted by kchinnam
Aia, sorry for the confusion. Corrected input from initial post.
Ideally I want to select

all lines from XML until </notify-list> & <application> XML block matching search string say myapp1-ear.
I see you have corrected the input xml from post #1.
This is how your xml looks like, by turning the showing tabs:

Code:
cat -T kchinnam.xml

Code:
<?xml version="1.0" encoding="UTF-8"?>
<deployment-request>
        <requestor>
                <first-name>kchinnam</first-name>
                <last-name>Group</last-name>
                <email-address>kchinnam@some.com</email-address>
        </requestor>
        <notify-list>
                <email-address>kchinnam@some.com</email-address>
        </notify-list>
^I^I<application>
^I^I^I^I<application-name>myapp1-ear</application-name>
^I^I^I^I<ear-file-name>myapp1-ear.ear</ear-file-name>
^I^I^I^I<edition></edition>
^I^I^I^I<shared-library-name></shared-library-name>
^I^I</application>
^I^I<application>
^I^I^I^I<application-name>myapp2-ear</application-name>
^I^I^I^I<ear-file-name>myapp2-ear.ear</ear-file-name>
^I^I^I^I<edition></edition>
^I^I^I^I<shared-library-name></shared-library-name>
^I^I^I^I^I^I<CookieSettings>
^I^I^I^I^I^I^I^I   <path>/</path>
^I^I^I^I^I^I  </CookieSettings>
^I^I^I^I</options>
^I^I</application>

All those ^I is a tab. The indentation is a mix between normal spaces and tabs.
Your latest request "appears" to be to select from XML each line until </notify-list> and <application> and stop. That does not make much sense, since it would yield:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<deployment-request>
        <requestor>
                <first-name>kchinnam</first-name>
                <last-name>Group</last-name>
                <email-address>kchinnam@some.com</email-address>
        </requestor>
        <notify-list>
                <email-address>kchinnam@some.com</email-address>
        </notify-list>
^I^I<application>

I am going to guess you want this:
Code:
perl -ne 'BEGIN{$/="</application>\n"} @block = m|(<application>.*myapp1-ear.*)^\s+?($/)|ms; if(@block){$block[0] =~ s/^\s+/\t/gms; print @block}' kchinnam.xml

Code:
<application>
        <application-name>myapp1-ear</application-name>
        <ear-file-name>myapp1-ear.ear</ear-file-name>
        <edition></edition>
        <shared-library-name></shared-library-name>
</application>

or maybe this:

Code:
perl -ne 'BEGIN{$/="</application>\n"} @block = m|(<application>.*myapp1-ear.*)^\s+?($/)|ms; if(@block){$block[0] =~ s/^\s+/" "x4/egms; print @block}' kchinnam.xml

Code:
<application>
    <application-name>myapp1-ear</application-name>
    <ear-file-name>myapp1-ear.ear</ear-file-name>
    <edition></edition>
    <shared-library-name></shared-library-name>
</application>

# 11  
Old 02-13-2016
Quote:
Originally Posted by kchinnam
Don ed solution worked great.. I never used it, I need to understand how its working. syntax looks very close to sed. I wish I could use a single like sed for this.
Hi,
sed was based on ed; ed came first. ed can do forwards and backwards searches; sed can't do backwards searches. The syntax for the ed g command is:
Code:
g/BRE/command

It tells ed to identify every line in the file that matches the basic regular expression BRE and for each line found, execute command on that line. And command in this case is:
Code:
?BRE1?,/BRE2/p

where p is the print command which takes zero, one, or two addresses to specify a range of lines to be printed. (No addresses prints the current line; one address prints the addressed line, and two addresses (separated by a comma) prints the lines from the 1st address up to and including the 2nd address.) The address specified by ?BRE1? searches for the line matching the basic regular expression BRE1 backwards from the current line and (as with sed) /BRE2/ searches forwards from the current line for a line matching the basic regular expression BRE2.

With your new sample input, the ed script I suggested should still print the lines you want. And, if you like to write less portable, 1-liners instead of code that will work with any POSIX-conforming shell, you can translate this to:
Code:
strear='myapp1-ear';ed -s xmlfile3.xml <<< "g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/p"

The above works with both 1993 or later versions of ksh and bash, but is a syntax error for many other POSIX-compliant shells.

And, if you want to strip two <tab> characters from the front of each of those lines, you could use:
Code:
#!/bin/ksh
strear='myapp1-ear'

ed -s xmlfile.xml <<EOF
g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/s/^..//
?<application-name>,.p
EOF

You could turn that into a 2-liner, but I much prefer readable and maintainable code to the minimal line approach.
This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 02-13-2016
Don, I want to keep \t tab characters.
I want my output to have initial generic XML block + search node + last closing element.

Code:
<?xml version="1.0" encoding="UTF-8"?>
<deployment-request>
  <requestor>
    <first-name>kchinnam</first-name>
    <last-name>Group</last-name>
    <email-address>kchinnam@some.com</email-address>
  </requestor>
  <notify-list>
    <email-address>kchinnam@some.com</email-address>
  </notify-list>
  <application>
    <application-name>myapp1-ear</application-name>
    <ear-file-name>myapp1-ear.ear</ear-file-name>
    <edition/>
    <shared-library-name/>
  </application>
</deployment-request>

Code:
so I started doing something like this, but it is not working.
# This is to get initial generic XML block of text.
ed -s xmlfile.xml <<EOF
g/<notify-list>/?<deployment-request>?,<\/deployment-request>/p
q
EOF

# once above one works, I would like to append that with matched block of XML.
ed -s xmlfile.xml <<EOF
g/<notify-list>/?<deployment-request>?,<\/deployment-request>/p
g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/p
q
EOF


Last edited by kchinnam; 02-13-2016 at 11:45 PM.. Reason: xml formatting
# 13  
Old 02-14-2016
You're making it much more difficult than it needs to be. The ed commands needed to print the header and the trailer are identical to the sed commands you need to do the same thing. And, there is no need for three invocations of ed to get the output you want. Try:
Code:
#!/bin/ksh
strear='myapp1-ear'

ed -s xmlfile.xml <<EOF
1,/<\/notify-list>/p
g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/p
$ p
q
EOF

Note that the <space> before the p on the next to the last line in the ed script is not an accident and must not be removed. (If you don't understand why, ask.)

But, the output you say you want in post #12 does not match the spacing in your latest update to your input now shown in post #1. The code above preserves the blanks (spaces in the first few lines and tabs in the last few lines) found in your input file in the output it produces:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<deployment-request>
        <requestor>
                <first-name>kchinnam</first-name>
                <last-name>Group</last-name>
                <email-address>kchinnam@some.com</email-address>
        </requestor>
        <notify-list>
                <email-address>kchinnam@some.com</email-address>
        </notify-list>
		<application>
				<application-name>myapp1-ear</application-name>
				<ear-file-name>myapp1-ear.ear</ear-file-name>
				<edition></edition>
				<shared-library-name></shared-library-name>
		</application>
</deployment-request>

# 14  
Old 02-14-2016
Don, I need to assign XML output to a variable. Assigning functions output to a variable causes output to loose newlines

Code:
#!/bin/bash
_getXMLblock()
{
strear='myapp1-ear'

ed -s xmlfile.xml <<EOF
1,/<\/notify-list>/p
g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/p
$ p
q
EOF
}

strXML=${_getXMLblock}
echo $strXML

# how can I retain line breaks in this method?
Code:
<?xml version="1.0" encoding="UTF-8"?> <deployment-request> <requestor> <first-name>kchinnam</first-name> <last-name>Group</last-name> <email-address>kchinnam@some.com</email-address> </requestor> <notify-list> <email-address>kchinnam@some.com</email-address> </notify-list> <application> <application-name>myapp1-ear</application-name> <ear-file-name>myapp1-ear.ear</ear-file-name> <edition/> <shared-library-name/> </application> </deployment-request>

This method keeps line breaks. But how can I use multiple statements?

Code:
#!/bin/bash
strear='myapp1-ear';

foundXML=$(ed -s xmlfile.xml <<< "1,/<\/notify-list>/p; g/<application-name>$strear<\/application-name>/?<application>?,/<\/application>/p; $ p")

echo $foundXML

output

Code:
?
foundXML[]

---------- Post updated at 04:23 PM ---------- Previous update was at 04:01 PM ----------

I could not edit my previous post for some reason.. can someone fix that?
I am able to preserve new lines with this:
Code:
echo "$strXML"


Last edited by kchinnam; 02-14-2016 at 05:34 PM.. Reason: corrected text
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract XML block when value is matched (Shell script)

Hi everyone, So i'm struggling with an xml (log file) where we get information about some devices, so the logfile is filled with multiple "blocks" like that. Based on the <devId> i want to extract this part of the xml file. If possible I want it to have an script for this, cause we'll use... (5 Replies)
Discussion started by: Pouky
5 Replies

2. Shell Programming and Scripting

Extract all text between the same matching string from a given column

Hello All, I have an input sample data like below (In actual I have many columns and few million rows). Column1,Column2 4,2 1,5 Hello,4 1,4 Hello,2 3,5 Hello,8 4,5 Need the output (using awk and/or sed preferably) like below. Here I need all the lines between 2 matching... (1 Reply)
Discussion started by: ks_reddy
1 Replies

3. Shell Programming and Scripting

Search String and extract few lines under the searched string

Need Assistance in shell programming... I have a huge file which has multiple stations and i wanted to search particular station and extract few lines from it and the rest is not needed Bold letters are the stations . The whole file has multiple stations . Below example i wanted to search... (4 Replies)
Discussion started by: ajayram_arya
4 Replies

4. Shell Programming and Scripting

Extract First and matching word from string in UNIX

Thank you (2 Replies)
Discussion started by: Pratik Majithia
2 Replies

5. Shell Programming and Scripting

To extract a string between two words in XML file

i need to extract the string between two tags, input file is <PersonInfoShipTo AddressID="446311709" AddressLine1="" AddressLine2="" AddressLine3="" AddressLine4="" AddressLine5="" AddressLine6="" AlternateEmailID="" Beeper="" City="" Company="" Country="" DayFaxNo="" DayPhone="" Department=""... (5 Replies)
Discussion started by: Padmanabhan
5 Replies

6. Shell Programming and Scripting

Extract string from XML

Hi, I wish to grep for the first instance of <listen-address> value between the first <server></server> tag in an xml file. Sample xml: ......... <timeout-seconds>1500</timeout-seconds> </jta> <server> <name>Adminserver_DEV</name> ... (9 Replies)
Discussion started by: mohtashims
9 Replies

7. Shell Programming and Scripting

XML - Split And Extract String between Chars

Hi, I am trying to read the records from file and split into multiple files. SourceFile.txt <?xml version="1.0" encoding="UTF-8"?>... (2 Replies)
Discussion started by: unme
2 Replies

8. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

9. Shell Programming and Scripting

Extract selective block from XML file

Hi, There's an xml file produced from a front-end tool as shown below: <INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS"> <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE... (6 Replies)
Discussion started by: dips_ag
6 Replies

10. Shell Programming and Scripting

Search for string in a file and extract another string to a variable

Hi, guys. I have one question: I need to search for a string in a file, and then extract another string from the file and assign it to a variable. For example: the contents of the file (group) is below: ... ftp:x:23: mail:x:34 ... testing:x:2001 sales:x:2002 development:x:2003 ...... (6 Replies)
Discussion started by: daikeyang
6 Replies
Login or Register to Ask a Question