extract xml tag based on condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract xml tag based on condition
# 1  
Old 01-15-2011
extract xml tag based on condition

Hi All,

I have a large xml file of invoices. The file looks like below:
Code:
<INVOICES>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>3456</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
</INVOICES>

I need to extract all the <INVOICE>...........</INVOICE> provided the value of INVOICE_NO = 2345 and 5678.

I searched the forum and found how to extract values between xml tag. But this is a different scenario.

Your help is highly appreciated.

Thanks
Angshuman

Last edited by Scott; 01-15-2011 at 10:01 AM.. Reason: Code tags
# 2  
Old 01-15-2011
Code:
ruby -ne 'BEGIN{$/="</INVOICE>"}; print "#{$_}\n"; if /2345|5678/  ' file


Last edited by Scott; 01-15-2011 at 10:01 AM.. Reason: Code tags
# 3  
Old 01-15-2011
HI Kurumi,

Thank you for your reply. Do we have any awk or sed command to achieve this?

Thanks
Angshuman
# 4  
Old 01-15-2011
Code:
kamaraj@kamaraj-laptop:~/Desktop$ for i in `cat xml_input`; do grep -B2 $i test | sed '$d'; grep -A1 $i test; done
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>

kamaraj@kamaraj-laptop:~/Desktop$ cat xml_input 
2345 
5678

# 5  
Old 01-15-2011
Hi Kamaraj,

Thank you for your reply. I tried your command but got the following:

grep: illegal option -- B
grep: illegal option -- 2

grep: illegal option -- A
grep: illegal option -- 1

Are these parameters of grep command ? Please let me know

Thanks
Angshuman
# 6  
Old 01-16-2011
Quote:
Originally Posted by angshuman
...
Code:
<INVOICES>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>3456</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
</INVOICES>

I need to extract all the <INVOICE>...........</INVOICE> provided the value of INVOICE_NO = 2345 and 5678.
...
Maybe something like this?

Code:
$
$ # display the contents of the xml file
$ cat f1.xml
<INVOICES>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>1234</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>3456</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
</INVOICES>
$
$ # Perl one-liner to extract the information
$ perl -lne 'BEGIN{undef $/} while(/(<INVOICE>(.*?)<\/INVOICE>)/sg) {$x=$1; print $x if $2 =~ /2345|5678/}' f1.xml
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>2345</INVOICE_NO>
</INVOICE>
<INVOICE>
<NAME>Customer A</NAME>
<INVOICE_NO>5678</INVOICE_NO>
</INVOICE>
$
$

tyler_durden
These 2 Users Gave Thanks to durden_tyler For This Post:
# 7  
Old 01-16-2011
what is the grep version you are using ?

what operating system is that ?

I am using the below version

Code:
kamaraj@kamaraj-laptop:~$ grep -V
GNU grep 2.5.4

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replacing tag based on condition

Hi All, I am having a file like below. The file will having information about the records.If you see the file the file is header and data. For example it have 1 men tag and the tag id will be come after headers. The change is I want to convert All pets tag from P to X. I did a sed like below... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. Shell Programming and Scripting

Help with tag value extraction from xml file based on a matching condition

Hi , I have a situation where I need to search an xml file for the presence of a tag <FollowOnFrom> and also , presence of partial part of the following tag <ContractRequest _LoadId and if these 2 exist ,then extract the value from the following tag <_LocalId> which is "CW2094139". There... (2 Replies)
Discussion started by: paul1234
2 Replies

3. Shell Programming and Scripting

Help with XML tag value extraction based on condition

sample xml file part <?xml version="1.0" encoding="UTF-8"?><ContractWorkspace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _LoadId="export_AJ6iAFmh+pQHq1" xsi:noNamespaceSchemaLocation="ContractWorkspace.xsd"> <_LocalId>CW2218471</_LocalId> <Active>true</Active> ... (3 Replies)
Discussion started by: paul1234
3 Replies

4. Shell Programming and Scripting

Help with XML tag value extraction based on matching condition

sample xml file part <DocumentMinorVersion>0</DocumentMinorVersion> <DocumentVersion>1</DocumentVersion> <EffectiveDate>2017-05-30T00:00:00Z</EffectiveDate> <FollowOnFrom> <ContractRequest _LoadId="export_AJ6iAFoh6g0rE9"> <_LocalId>CRW2218451</_LocalId> ... (4 Replies)
Discussion started by: paul1234
4 Replies

5. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

6. Shell Programming and Scripting

Extract TAG name and XPATH from XML file via shellscript

Hi, Here is a sample xml file and expected output. I need to extract the element/tag name (not value) and xpath (sample output.txt). But the main problem is I put here one simple xml file where I can clearly see the number of elements, but in real time I have a xml file which have over 500... (18 Replies)
Discussion started by: BithunC
18 Replies

7. Shell Programming and Scripting

Extract multiple xml tag value into CSV format

Hi All, Need your assistance on another xml tag related issue. I have a xml file as below: <INVOICES> <INVOICE> <BILL> <BILL_NO>1234</BILL_NO> <BILL_DATE>01 JAN 2011</BILL_DATE> </BILL> <NAMEINFO> <NAME>ABC</NAME> </NAMEINFO> </INVOICE> <INVOICE> <BILL> <BILL_NO>5678</BILL_NO>... (12 Replies)
Discussion started by: angshuman
12 Replies

8. Shell Programming and Scripting

how to extract the info in the tag from a xml file

Hi All, Do anyone of you have any idea how to extract each<info> tag to each different file. I have 1000 raw files, which come in every 15 mins.( I am using bash) I have tried my script as below, but it took hours to finish, which is inefficiency. perl -n -e '/^<info>/ and open FH,">file".$n++;... (2 Replies)
Discussion started by: natalie23
2 Replies

9. UNIX for Dummies Questions & Answers

Unable to extract a tag from a very long XML message

Hi I have a log file which contain XML message. I want to extract the value between the tag : <businessEventId>13201330</businessEventId> i.e., 13201330. I tried the following commands but as the message is very long, unable to do it. Attached is the log file. Please provide inputs. --... (3 Replies)
Discussion started by: Sapna_Sai
3 Replies

10. Shell Programming and Scripting

Extract value inside <text> tag for a particular condition.

Hi All! I have obtained following output from a tool "pdftohtml" :: So, my input is as under: <text top="246" left="160" width="84" height="16" font="3">Business purpose</text> <text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights... (3 Replies)
Discussion started by: parshant_bvcoe
3 Replies
Login or Register to Ask a Question