![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| awk: Extracting part of the buffer | venkat_k | Shell Programming and Scripting | 7 | 09-23-2008 08:36 AM |
| need help extracting this part | finalight | Shell Programming and Scripting | 6 | 05-20-2008 07:03 AM |
| Extracting part of a string | sam_78_nyc | Shell Programming and Scripting | 8 | 04-25-2007 08:37 PM |
| Extracting part of the basename | madhunk | Shell Programming and Scripting | 3 | 02-13-2007 11:54 AM |
| extracting uncommon part between two files | sabyasm | Shell Programming and Scripting | 2 | 11-06-2005 01:25 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Hi Guys,
I have a very large XML feed (2.7 MB) which crashes the server at the time of parsing. Now to reduce the load on the server I have a cron job running every 5 min.'s. This job will get the file from the feed host and keep it in the local machine. This does not solve the problem as the file still gets loaded in the server. The file looks something like this: <?xml version="1.0" standalone="no"?> <IRXML CorpMasterID=""> <NewsReleases PubDate="20081104" PubTime="16:48:03"> <NewsCategory Category=""> <NewsRelease ReleaseID="" DLU="20081104 16:47:00" ArchiveStatus="Current" RNSSource=""> <Title></Title> <ExternalURL/> <Date Date="20081104" Time="16:33:00">11/4/2008 4:33:00 PM</Date> <ContentNetworkingLinks/> <Categories> <Category></Category> </Categories> </NewsRelease> <NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current" RNSSource=""> <Title></Title> <ExternalURL/> <Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date> <ContentNetworkingLinks/> <Categories> <Category></Category> </Categories> </NewsRelease> I want to write a shell script which will extract only the part starting from <NewsRelease> till </NewsRelease> Something like: <NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current" RNSSource=""> <Title></Title> <ExternalURL/> <Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date> <ContentNetworkingLinks/> <Categories> <Category></Category> </Categories> </NewsRelease> Also there is one more problem, in unix when the file is downloaded there are no return carriage, so the complete file appears to be in one line .Any help would be appreciated. Thanks, Shridhar |
|
||||
|
copying the complete file
Thanks for the reply.
There seems to be some problem with the command. The command seems to execute, but when I see the outputfile, it is the complete copy of the xmlfeed. I don't think there is a problem with the file format, because I do not see ^M in the file. I think the problem could be with the multiple occurrences of "NewsRelease" in the file. Also my requirement is that, I need the first 5 occurrences of <NewsRelease> ... </NewsRelease> from the XMLFeed to another file, as I need to Parse the first 5 news releases to HTML using XSL. Please let me know if this is possible. Thanks again. Shridhar |
|
||||
|
Hope this can help you some. it will only print out the first five part surrounded by <NewsRelease and /NewsRelease>. Code:
awk '/<NewsRelease/,/\/NewsRelease/{
if(n<5)
print
if(index($0,"/NewsRelease")!=0)
n++
}' filename
|
|
|||||
|
Quote:
Code:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:template match="/">
<xsl:apply-templates>
<xsl:with-param name="mycount" select="5"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="NewsReleases">
<xsl:param name="mycount"/>
<xsl:element name="NewsReleases">
<xsl:attribute name="PubDate">
<xsl:value-of select="@PubDate"/>
</xsl:attribute>
<xsl:attribute name="PubTime">
<xsl:value-of select="@PubTime"/>
</xsl:attribute>
<xsl:text>
</xsl:text>
<xsl:for-each select="//NewsRelease[position() <=$mycount]">
<xsl:copy-of select="."/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
This assumes that your irXML document is well formed (XML) - which not the case for the sample document you supplied. |
![]() |
| Bookmarks |
| Tags |
| awk, awk trim, trim, trim awk |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|