Extracting a part of XML File


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting a part of XML File
# 1  
Old 11-10-2008
Question Extracting a part of XML File

Hi Guys,

I have a very large XML feed (2.7 MB) which crashes the server at the time of parsing. Now to reduce the load on the server I have a cron job running every 5 min.'s. This job will get the file from the feed host and keep it in the local machine.

This does not solve the problem as the file still gets loaded in the server. The file looks something like this:

<?xml version="1.0" standalone="no"?>
<IRXML CorpMasterID="">
<NewsReleases PubDate="20081104" PubTime="16:48:03">
<NewsCategory Category="">
<NewsRelease ReleaseID="" DLU="20081104 16:47:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="16:33:00">11/4/2008 4:33:00 PM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>
<NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>

I want to write a shell script which will extract only the part starting from
<NewsRelease> till </NewsRelease>
Something like:

<NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>

Also there is one more problem, in unix when the file is downloaded there are no return carriage, so the complete file appears to be in one line Smilie.

Any help would be appreciated. Thanks,
Shridhar
# 2  
Old 11-10-2008
Code:
sed -n '/<NewsRelease R/,/<\/NewsRelease>/p' xmldump >outputfile

# 3  
Old 11-10-2008
regarding the end of line problem, what format is the file currently in i.e. does it have LF, CR/LF or CR as it's end of line marker?
depending on format depends on which tool to use.
to go from dos to unix use dos2unix or run the file up in vim and :set fileformat=unix
# 4  
Old 11-11-2008
copying the complete file

Thanks for the reply.

There seems to be some problem with the command. The command seems to execute, but when I see the outputfile, it is the complete copy of the xmlfeed.
I don't think there is a problem with the file format, because I do not see ^M in the file.
I think the problem could be with the multiple occurrences of "NewsRelease" in the file.

Also my requirement is that, I need the first 5 occurrences of <NewsRelease> ... </NewsRelease> from the XMLFeed to another file, as I need to Parse the first 5 news releases to HTML using XSL.

Please let me know if this is possible.

Thanks again.
Shridhar
# 5  
Old 11-12-2008
Hope this can help you some.

it will only print out the first five part surrounded by <NewsRelease and /NewsRelease>.



Code:
awk '/<NewsRelease/,/\/NewsRelease/{
if(n<5)
	print
if(index($0,"/NewsRelease")!=0)
	n++
}' filename

# 6  
Old 11-12-2008
Thanks got it almost working

Thanks for the reply, it worked Smilie ... I have to add few more things to make it work completely.

Warm Regards,
Shridhar
# 7  
Old 11-12-2008
Quote:
Also my requirement is that, I need the first 5 occurrences of <NewsRelease> ... </NewsRelease> from the XMLFeed to another file, as I need to Parse the first 5 news releases to HTML using XSL.
Why not extract the first 5 releases using XSLT i.e.
Code:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:output method="xml"/>

  <xsl:template match="/">
    <xsl:apply-templates>
      <xsl:with-param name="mycount" select="5"/>
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="NewsReleases">
    <xsl:param name="mycount"/>
      <xsl:element name="NewsReleases">
      <xsl:attribute name="PubDate">
         <xsl:value-of select="@PubDate"/>
      </xsl:attribute>
      <xsl:attribute name="PubTime">
         <xsl:value-of select="@PubTime"/>
      </xsl:attribute>
      <xsl:text>&#xA;</xsl:text>
      <xsl:for-each select="//NewsRelease[position() &lt;=$mycount]">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:text>&#xA;</xsl:text>
      </xsl:element>
  </xsl:template>

</xsl:stylesheet>

This assumes that your irXML document is well formed (XML) - which not the case for the sample document you supplied.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need Help in extracting data from XML File

Hi All My input file is an XML and it has some tags and data rows at end. Starting of data rows is <rs:data> and ending of data rows is </rs:data>. Within sample data rows (2 rows) shown below, I want to extract data value after equal to sign (until space or "/" sign). So if XML data... (7 Replies)
Discussion started by: vx04
7 Replies

2. Shell Programming and Scripting

Extracting the tag name from an xml file

Hi, My requirement is something like this, I have a xml file that contains some tags and nested tags, <n:tag_name1> <n:sub_tag1>val1</n:sub_tag1> <n:sub_tag2>val2</n:sub_tag2> </n:tag_name1> <n:tag_name2> <n:sub_tag1>value</n:sub_tag1> ... (6 Replies)
Discussion started by: Little
6 Replies

3. Shell Programming and Scripting

Reading XML file and extracting value

Dear All, I am reading one XML file to extract value from the particular tag:- Sample xml is below:- <KeyValuePairs> <Key>TestString</Key> <Value>Test12_Pollings</Value> </KeyValuePairs> I want to read the value for the KEY tag and there will be multiple key tags :- awk... (4 Replies)
Discussion started by: sharsour
4 Replies

4. Shell Programming and Scripting

Extracting content from xml file

Hello All, Hope you are doing well!!!!! I have a small code in the below format in xml file: <UML:ModelElement.taggedValue> <UML:TaggedValue tag="documentation" value="This sequence&#xA;&#xA;HLD_EA_0001X&#xA;HLD_DOORS_002X"/> <UML:TaggedValue tag="documentation" value="This... (11 Replies)
Discussion started by: suvendu4urs
11 Replies

5. Shell Programming and Scripting

Need help in extracting data from xml file

Hello, This is my first post in here, so excuse me if I sound too noob here! I need to extract the path "/apps/mp/installedApps/V61/HRO/hrms_01698_A_qa.ear" from the below xml extract. The path will always appear with the key "binariesURL" <deployedObject... (6 Replies)
Discussion started by: abhishek2386
6 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from an xml file

Hello, Please can someone assist. I have the following xml file: <?xml version="1.0" encoding="utf-8" ?> - <PUTTRIGGER xmlns:xsd="http://www.test.org/2001/XMLSchema" xmlns:xsi="http://www.test.org/2001/XMLSchema-instance" APPLICATIONNUMBER="0501160" ACCOUNTNAME="Mrs S Test"... (15 Replies)
Discussion started by: Dolph
15 Replies

7. UNIX for Dummies Questions & Answers

Extracting values from an XML file

Hello People, I have an xml file from which I need to extract the values of the parameters using UNIX shell commands. Ex : Input is like : <Name>Roger</Name> or <Address>MI</Address> I need the output as just : Roger or MI with the tags removed. Please help. (1 Reply)
Discussion started by: sushant172
1 Replies

8. Shell Programming and Scripting

extracting part of a text file

Hi guys So I have a very large log file where each event is logged along with the time that it occurred. So for e.g. The contents of the file look like: ... 12:00:07 event 0 happened. 12:01:01 event 1 happened. 12:01:05 event 2 happened. 12:01:30 event 3 happened. 12:02:01 event 4... (10 Replies)
Discussion started by: alinaqvi90
10 Replies

9. Shell Programming and Scripting

Extracting Data from xml file

Hi ppl out there... Can anyone help me with the shell script to extract data from an xml file. My xml file looks like : - <servlet> <servlet-name>FrontServlet</servlet-name> <display-name>FrontServlet</display-name> ... (3 Replies)
Discussion started by: nishana
3 Replies

10. Shell Programming and Scripting

extracting XML file using sed

Hello folks I want to extract data between certain tag in XML file using 'sed' <xml> ......... .......... <one>XXXXXXXXXXXXXXXXXXXX</one> ...... Anyone ?Thank you (7 Replies)
Discussion started by: pujansrt
7 Replies
Login or Register to Ask a Question