extract xml tag based on condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract xml tag based on condition
# 8  
Old 01-16-2011
The Perl solution by durden_tyler is excellent providing that the search term is not present in an unrelated element, i e.
Code:
<INVOICES>
<INVOICE>
<NAME>Customer A 2345</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>3456</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
</INVOICES>

The Perl example will incorrectly output:
Code:
<INVOICE>
<NAME>Customer A 2345</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>

A more precise solution is to use XSLT. If xsltproc is available to you (and it is on all GNU/Linux distributions) the following XSL stylesheet will provide a precise answer:
Code:
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <!-- XSLTPROC pass in as -param invno "'value'" -->
   <xsl:param name="invno1">XXXX</xsl:param>
   <xsl:param name="invno2">XXXX</xsl:param>

   <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

   <xsl:template match="/">
      <xsl:apply-templates select="INVOICES"/>
   </xsl:template>

   <xsl:template match="INVOICES">
      <xsl:apply-templates select="INVOICE"/>
   </xsl:template>

   <xsl:template match="INVOICE">
      <xsl:if test="./INVOICE_NO = $invno1 or ./INVOICE_NO = $invno2">
         <xsl:copy-of select="." />
      </xsl:if>
   </xsl:template>

</xsl:stylesheet>

For example:
Code:
$ xsltproc --param invno1 "'1234'" --param invno2 "'3456'" example.xsl example.xml
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE><INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>3456</INVOICE_NO>
</INVOICE>

This User Gave Thanks to fpmurphy For This Post:
# 9  
Old 01-16-2011
Hi Kamaraj,

I am using HP-UX S29BF226 B.11.23 U ia64 4081221980 unlimited-user license

Thanks
Angshuman

---------- Post updated at 09:55 PM ---------- Previous update was at 09:22 PM ----------

Hi fpmurphy,

First I would like to thank all of you to take out some time and reply my question.

xsltproc is not available. I tried the solution provided by durden_tyler and it is working fine except the scenario that you have highlighted. Though, the chance of having invoice number in any other tag is remote, still I should take care of that.

Is there any other way I can achieve this ? I also would like to raise another concern. In my question, I mentioned that it is required to print <INVOICE>.....</INVOICE> provided <INVOICE_NO>2345</INVOCIE_NO>. In case the value is passed through a variable, the following code does not return anything. I modifed the solution of durden_tyler as below

Quote:
perl -lne 'BEGIN{undef $/} while(/(<INVOICE>(.*?)<\/INVOICE>)/sg) {$x=$1; print $x if $2 =~ /$INVOICENO/}' f1.xml
# 10  
Old 01-16-2011
Quote:
Originally Posted by angshuman
...In my question, I mentioned that it is required to print <INVOICE>.....</INVOICE> provided <INVOICE_NO>2345</INVOCIE_NO>.
...
You could make your regex more precise to come up with accurate results -

Code:
$
$ perl -lne 'BEGIN{undef $/} while(/(<INVOICE>(.*?)<\/INVOICE>)/sg) {$x=$1; print $x if $2 =~ /<INVOICE_NO>(2345|5678)<\/INVOICE_NO>/}' f1.xml
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
$
$

Quote:
...
In case the value is passed through a variable, the following code does not return anything. I modifed the solution of durden_tyler as below...
The one-liner will have to change if you want to pass a shell variable to it -

Code:
$
$
$ export MY_INVOICE_NO="2345"
$
$
$ perl -lne "BEGIN{undef $/}
             while(/(<INVOICE>(.*?)<\/INVOICE>)/sg) {\$x=\$1; print \$x if \$2=~/<INVOICE_NO>$MY_INVOICE_NO<\/INVOICE_NO>/}" f1.xml
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
$
$


You could be more creative and pass multiple Invoice Numbers thusly -

Code:
$
$
$ export MY_INVOICE_NOS="2345|5678"
$
$
$ perl -lne "BEGIN{undef $/}
             while(/(<INVOICE>(.*?)<\/INVOICE>)/sg) {\$x=\$1; print \$x if \$2=~/<INVOICE_NO>$MY_INVOICE_NOS<\/INVOICE_NO>/}" f1.xml
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
$
$

However, if you want to do serious XML work then XSLT is the way to go, as suggested by fpmurphy.

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replacing tag based on condition

Hi All, I am having a file like below. The file will having information about the records.If you see the file the file is header and data. For example it have 1 men tag and the tag id will be come after headers. The change is I want to convert All pets tag from P to X. I did a sed like below... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. Shell Programming and Scripting

Help with tag value extraction from xml file based on a matching condition

Hi , I have a situation where I need to search an xml file for the presence of a tag <FollowOnFrom> and also , presence of partial part of the following tag <ContractRequest _LoadId and if these 2 exist ,then extract the value from the following tag <_LocalId> which is "CW2094139". There... (2 Replies)
Discussion started by: paul1234
2 Replies

3. Shell Programming and Scripting

Help with XML tag value extraction based on condition

sample xml file part <?xml version="1.0" encoding="UTF-8"?><ContractWorkspace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _LoadId="export_AJ6iAFmh+pQHq1" xsi:noNamespaceSchemaLocation="ContractWorkspace.xsd"> <_LocalId>CW2218471</_LocalId> <Active>true</Active> ... (3 Replies)
Discussion started by: paul1234
3 Replies

4. Shell Programming and Scripting

Help with XML tag value extraction based on matching condition

sample xml file part <DocumentMinorVersion>0</DocumentMinorVersion> <DocumentVersion>1</DocumentVersion> <EffectiveDate>2017-05-30T00:00:00Z</EffectiveDate> <FollowOnFrom> <ContractRequest _LoadId="export_AJ6iAFoh6g0rE9"> <_LocalId>CRW2218451</_LocalId> ... (4 Replies)
Discussion started by: paul1234
4 Replies

5. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

6. Shell Programming and Scripting

Extract TAG name and XPATH from XML file via shellscript

Hi, Here is a sample xml file and expected output. I need to extract the element/tag name (not value) and xpath (sample output.txt). But the main problem is I put here one simple xml file where I can clearly see the number of elements, but in real time I have a xml file which have over 500... (18 Replies)
Discussion started by: BithunC
18 Replies

7. Shell Programming and Scripting

Extract multiple xml tag value into CSV format

Hi All, Need your assistance on another xml tag related issue. I have a xml file as below: <INVOICES> <INVOICE> <BILL> <BILL_NO>1234</BILL_NO> <BILL_DATE>01 JAN 2011</BILL_DATE> </BILL> <NAMEINFO> <NAME>ABC</NAME> </NAMEINFO> </INVOICE> <INVOICE> <BILL> <BILL_NO>5678</BILL_NO>... (12 Replies)
Discussion started by: angshuman
12 Replies

8. Shell Programming and Scripting

how to extract the info in the tag from a xml file

Hi All, Do anyone of you have any idea how to extract each<info> tag to each different file. I have 1000 raw files, which come in every 15 mins.( I am using bash) I have tried my script as below, but it took hours to finish, which is inefficiency. perl -n -e '/^<info>/ and open FH,">file".$n++;... (2 Replies)
Discussion started by: natalie23
2 Replies

9. UNIX for Dummies Questions & Answers

Unable to extract a tag from a very long XML message

Hi I have a log file which contain XML message. I want to extract the value between the tag : <businessEventId>13201330</businessEventId> i.e., 13201330. I tried the following commands but as the message is very long, unable to do it. Attached is the log file. Please provide inputs. --... (3 Replies)
Discussion started by: Sapna_Sai
3 Replies

10. Shell Programming and Scripting

Extract value inside <text> tag for a particular condition.

Hi All! I have obtained following output from a tool "pdftohtml" :: So, my input is as under: <text top="246" left="160" width="84" height="16" font="3">Business purpose</text> <text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights... (3 Replies)
Discussion started by: parshant_bvcoe
3 Replies
Login or Register to Ask a Question