Extracting XML Tag Contents


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting XML Tag Contents
# 1  
Old 07-25-2008
Extracting XML Tag Contents

Hi Jean

I require your help in writing a shell script. Iam zero in Unix programming. I have a large file about 400 MB of data, which contains about 50000 XML messages seperated by a Tab, I think. I need to extract only 4 values from each XML message and write it onto a new file. Please help me with this.

Input File:

Code:
<logRequest xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:str="http://exslt.org/strings" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:secext="http://schemas.xmlsoap.org/ws/2002/04/secext" xmlns:rrbfunc="urn:fanta:bus:schemas:functions:1.0" xmlns:bus="urn:fanta:bus:schemas:context:1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:metrics20="urn:fanta:bus:schemas:metrics:2.0" xmlns:metrics10="urn:fanta:bus:schemas:metrics:1.0" xmlns:exsl="http://exslt.org/common" xmlns:endpt="urn:schemas.fantacom/bus/1.0/spInfo" xmlns:date="http://exslt.org/dates-and-times" xmlns:common="urn:fanta:bus:xslt:common.xsl" xmlns:cam="urn:fanta:comsec:authn:1.0"><logHeader><timestamp>2008-07-24T07:15:48.457000-04:00</timestamp><direction>response</direction><logType>SERVICE</logType></logHeader><logPayload><SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><Security xmlns="http://schemas.xmlsoap.org/ws/2002/04/secext"><wsse:BinarySecurityToken EncodingType="sentry:Base64Binary" ValueType="sentry:CSK1" cam:Username="262979139" cam:OpaqueId="262979139" xmlns:sentry="urn:fanta:sentry:schemas:security:1.2">pa1044iG3KjTWDx2DLRcQQliPVT8ryGTbVDOSP32NU4JSTP0k@</wsse:BinarySecurityToken></Security><context xmlns="urn:fanta:bus:schemas:context:1.0"><PilotRollout xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Region xmlns="">RZ_SAMS</Region></PilotRollout><channel xmlns:i="http://www.w3.org/2001/XMLSchema-instance">IO</channel><properties xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><property name="WebPartId">EmailAddressWebPart</property><property name="WebPartAction">Default</property><property name="CorrelatorId">54aee7c2-dbb7-49fe-b853-cbb2ea87ec7b</property><property name="AsyncCall"/></properties></context><currentCorrelId xmlns="urn:fanta:bus:schemas:metrics:1.0">e5befe1c-9b73-4586-830a-eff4cb485492</currentCorrelId><metrics10:point id="e710b65e-f483-4e21-bfbe-aac13a892fea" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="XX.XXX.XXX.XX" type="fantabus.intermediary"><metrics10:start>2008-07-24 11:15:47.919000 UTC</metrics10:start><metrics10:block>2008-07-24 11:15:48.418000 UTC</metrics10:block></metrics10:point><bus:point type="fantabus.provider" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="C2VTR6Z4" id="003425872549152C2VTR6Z45564001" xmlns:bus="urn:fanta:bus:schemas:metrics:1.0"><bus:start>2008-07-24 11:15:49.118778 UTC</bus:start><bus:block>2008-07-24 11:15:49.128023 UTC</bus:block><bus:unblock>2008-07-24 11:15:49.149758 UTC</bus:unblock><bus:stop>2008-07-24 11:15:49.152776 UTC</bus:stop></bus:point><endpt:spInfo><endpt:tranId>RRFQ</endpt:tranId><endpt:operation>GetPaperless</endpt:operation><endpt:TORName>C2VTR6Z4</endpt:TORName><endpt:AORName>C2VAR2Z7</endpt:AORName><endpt:taskNum>0090987</endpt:taskNum><endpt:UOWID>C2BD376A4B3D8E05</endpt:UOWID></endpt:spInfo></SOAP-ENV:Header></logPayload></logRequest>
<logRequest xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:str="http://exslt.org/strings" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:secext="http://schemas.xmlsoap.org/ws/2002/04/secext" xmlns:rrbfunc="urn:fanta:bus:schemas:functions:1.0" xmlns:bus="urn:fanta:bus:schemas:context:1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:metrics20="urn:fanta:bus:schemas:metrics:2.0" xmlns:metrics10="urn:fanta:bus:schemas:metrics:1.0" xmlns:exsl="http://exslt.org/common" xmlns:endpt="urn:schemas.fantacom/bus/1.0/spInfo" xmlns:date="http://exslt.org/dates-and-times" xmlns:common="urn:fanta:bus:xslt:common.xsl" xmlns:cam="urn:fanta:comsec:authn:1.0"><logHeader><timestamp>2008-07-24T07:15:48.457000-04:00</timestamp><direction>response</direction><logType>SERVICE</logType></logHeader><logPayload><SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><Security xmlns="http://schemas.xmlsoap.org/ws/2002/04/secext"><wsse:BinarySecurityToken EncodingType="sentry:Base64Binary" ValueType="sentry:CSK1" cam:Username="262979139" cam:OpaqueId="262979139" xmlns:sentry="urn:fanta:sentry:schemas:security:1.2">pa1044iG3KjTWDx2DLRcQQliPVT8ryGTbVDOSP32NU4JSTP0k@</wsse:BinarySecurityToken></Security><context xmlns="urn:fanta:bus:schemas:context:1.0"><PilotRollout xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Region xmlns="">RZ_SAMS</Region></PilotRollout><channel xmlns:i="http://www.w3.org/2001/XMLSchema-instance">IO</channel><properties xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><property name="WebPartId">EmailAddressWebPart</property><property name="WebPartAction">Default</property><property name="CorrelatorId">54aee7c2-dbb7-49fe-b853-cbb2ea87ec7b</property><property name="AsyncCall"/></properties></context><currentCorrelId xmlns="urn:fanta:bus:schemas:metrics:1.0">e5befe1c-9b73-4586-830a-eff4cb485492</currentCorrelId><metrics10:point id="e710b65e-f483-4e21-bfbe-aac13a892fea" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="XX.XXX.XXX.XXX" type="fantabus.intermediary"><metrics10:start>2008-07-24 11:15:47.919000 UTC</metrics10:start><metrics10:block>2008-07-24 11:15:48.418000 UTC</metrics10:block></metrics10:point><bus:point type="fantabus.provider" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="C2VTR6Z4" id="003425872549152C2VTR6Z45564001" xmlns:bus="urn:fanta:bus:schemas:metrics:1.0"><bus:start>2008-07-24 11:15:49.118778 UTC</bus:start><bus:block>2008-07-24 11:15:49.128023 UTC</bus:block><bus:unblock>2008-07-24 11:15:49.149758 UTC</bus:unblock><bus:stop>2008-07-24 11:15:49.152776 UTC</bus:stop></bus:point><endpt:spInfo><endpt:tranId>RRFQ</endpt:tranId><endpt:operation>GetPaperless</endpt:operation><endpt:TORName>C2VTR6Z4</endpt:TORName><endpt:AORName>C2VAR2Z7</endpt:AORName><endpt:taskNum>0090987</endpt:taskNum><endpt:UOWID>C2BD376A4B3D8E05</endpt:UOWID></endpt:spInfo></SOAP-ENV:Header></logPayload></logRequest>

Expected Output: output.txt containing the following

Code:
timestamp webpartID bus:block bus:unblock endpt:operation
2008-07-24T07:15:48.457000-04:00 EmailAddressWebPart 2008-07-24 11:15:49.128023 UTC 2008-07-24 11:15:49.149758 UTC GetPaperless
2008-07-24T07:15:48.457000-04:00 EmailAddressWebPart 2008-07-24 11:15:49.128023 UTC 2008-07-24 11:15:49.149758 UTC GetPaperless


Last edited by radoulov; 07-25-2008 at 06:11 PM.. Reason: added code tags
# 2  
Old 07-25-2008
Code:
perl>output.txt -nle'BEGIN { 
  $, = " "; 
  print "timestamp webpartID bus:block bus:unblock endpt:operation";
  }
  print 
    /timestamp>(.*?)<.*?
     "WebPartId">(.*?)<.*?
     bus:block>(.*?)<.*?
     bus:unblock>(.*?)<.*?
     endpt:operation>(.*?)<
    /x
' filename

# 3  
Old 07-30-2008
Need your Help....

I want to extract the value of Webpart ID. Please let me know where Iam going wrong.


egrep "<property name=\"WebPartId" /tmp/datapower/GetCustBankAccountsService.log | sed -e "s/^.*WebPartId"\" | cut -f2 -d">"| cut -f1 -d"<" > /tmp/temp1.xls
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

Extracting the tag name from an xml file

Hi, My requirement is something like this, I have a xml file that contains some tags and nested tags, <n:tag_name1> <n:sub_tag1>val1</n:sub_tag1> <n:sub_tag2>val2</n:sub_tag2> </n:tag_name1> <n:tag_name2> <n:sub_tag1>value</n:sub_tag1> ... (6 Replies)
Discussion started by: Little
6 Replies

4. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

5. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

6. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

7. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

8. Shell Programming and Scripting

Extracting the value of an middle attribute tag from XML

Hi All, Please help me out in resolving this.. <secondTag enabled='true' processName='test1' pidFile='/tmp/test1.pid' /> From the above tag, I'm trying to retrieve the value of enabled and pidFile attributes by means of processName attribute. Would be thankful in resolving this..... (5 Replies)
Discussion started by: mjavalkar
5 Replies

9. Shell Programming and Scripting

Extracting the value of an attribute tag from XML

Greetings, I am very new to the UNIX shell scripting and would like to learn. However, I am currently stuck on how to process the below sample of code from an XML file using UNIX comands: <ATTRIBUTE NAME="Memory" VALUE="512MB"/> <ATTRIBUTE NAME="CPU Speed" VALUE="3.0GHz"/> <ATTRIBUTE... (5 Replies)
Discussion started by: JesterMania
5 Replies

10. Shell Programming and Scripting

Extracting tag values from XML using perl

Hi All, I'm trying to extract the values for the 'src' and 'alt' tags within an xml file. In the files that I'm searching, the tags are always enclosed within an 'img' tag. Typically: <img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark... (3 Replies)
Discussion started by: Steve_altius
3 Replies
Login or Register to Ask a Question