Random XML Parsing - using Perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Random XML Parsing - using Perl
# 1  
Old 09-08-2010
Random XML Parsing - using Perl

Given the XML:
Code:
 
<?xml version="1.0" encoding="UTF-8"?>
<reference>
<refbody>
<section>
<p>
<ul>
<li><xref href="file1.dita#anchor" /></li>
<li><xref href="file2.dita#anchor" /></li>
</ul>
</p>
</section>
<section>
<p>
<xref href="file3.dita#anchor" />
</p>
<p>
<xref href="file4.dita#anchor" />
</p>
</section>
</refbody>
</reference>

I would like to get a list of xref href values:
Code:
href="file1.dita#anchor"
href="file2.dita#anchor"
href="file3.dita#anchor"
href="file4.dita#anchor"


I've used Perl XML::Simple, but it requires that I know the document structure of the document and since these xrefs can occur anywhere in the document, I'm not sure how to handle this. I have several files to process so any assistance you can provide would be helpful.
# 2  
Old 09-08-2010
my_perl_script.pl

Code:
#!/usr/bin/perl

while (<>) {
chomp $_;
if ($_ =~ /.*xref\ (href\=\"\S+\")\ \/\>.*/) {
print $1 . "\n";
}
}

Then run it something like this:
Code:
cat /my/xml_file.xml |my_perl_script.pl

# 3  
Old 09-09-2010
If the regex pattern of interest lies in a single line, then:

Code:
$ 
$ 
$ cat sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<reference>
<refbody>
<section>
<p>
<ul>
<li><xref href="file1.dita#anchor" /></li>
<li><xref href="file2.dita#anchor" /></li>
</ul>
</p>
</section>
<section>
<p>
<xref href="file3.dita#anchor" />
</p>
<p>
<xref href="file4.dita#anchor" />
</p>
</section>
</refbody>
</reference>
$ 
$ 
$ perl -lne '/^.*(href=".*").*$/ && print $1' sample.xml
href="file1.dita#anchor"
href="file2.dita#anchor"
href="file3.dita#anchor"
href="file4.dita#anchor"
$ 
$ 

tyler_durden

Last edited by durden_tyler; 09-09-2010 at 12:59 AM..
# 4  
Old 09-09-2010
Thank you all for your replies. I found an XPath module that does the trick.
Code:
#!/usr/bin/perl
use XML::XPath;
use XML::XPath::XMLParser;

@files = <*.dita>;

foreach my $file (@files) {
  my $xp = XML::XPath->new(filename => $file);
  my $xrefnode   = $xp->find('//xref/@href'); # find all xrefs
  print "Processing file: ".$file."\n";
  foreach my $node ($xrefnode->get_nodelist) {
    my $XrefNode = XML::XPath::XMLParser::as_string($node);
    print " xref: ". $XrefNode,"\n";
  }
}

# 5  
Old 09-09-2010
Here is a XSLT stylesheet which will output what you are looking for
Code:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:output method="text" />

   <xsl:template match="/" >
      <xsl:apply-templates select="//xref/@href"/>
   </xsl:template>

   <xsl:template match="//xref/@href" >
 xref: <xsl:value-of select="." />
   </xsl:template>

</xsl:stylesheet>

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

XML parsing

I have an xml file where the format looks like below <SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command"> <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1"> ... (8 Replies)
Discussion started by: r_t_1601
8 Replies

2. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

3. Shell Programming and Scripting

Help in parsing XML output file in perl.

Hi I have an XML output like : <?xml version="1.0" encoding="ISO-8859-1" ?> - <envelope> - <body> - <outputGetUsageSummary> - <usgSumm rerateDone="5"> - <usageAccum accumId="269" accumCaptn="VD_DP_AR" inclUnits="9999999.00" inclUnitsUsed="0.00" shared="false" pooled="false"... (7 Replies)
Discussion started by: rkrish
7 Replies

4. Shell Programming and Scripting

parsing XML result by using perl?

for some reasons, i need to parse the XML result by using perl. for instance, this is a sample XML result: <Response> <status>success</status> <answer>AAA::AAA</answer> <answer>BBB::BBB</answer> </Response> then i can use this way : my @output = (); foreach my $parts (@all) ##@all... (2 Replies)
Discussion started by: tiger2000
2 Replies

5. UNIX for Advanced & Expert Users

XML Parsing

I had a big XML and from which I have to make a layout as below *TOTAL+CB | *CB+FX | CS |*IR | *TOTAL | -------------------------------------------------------------------------------------------------- |CB FX | | | | DMFXNY EMSGFX... (6 Replies)
Discussion started by: manas_ranjan
6 Replies

6. Shell Programming and Scripting

Parsing XML

Learned People, Hello ! Till today, for the most part, all of the tricky questions/situations that I encountered were already posted by other folks and all I had to do was peruse through these one at a time and I could find some sort of an answer and all I had to do was add some minor tweaks... (5 Replies)
Discussion started by: ManoharMa
5 Replies

7. Shell Programming and Scripting

Bash XML Parsing using Perl XPath

I have a bash script that needs to read input from an XML file, which includes varying numbers of a certain type of child node. I want to be able to iterate through all the child nodes of a given parent. I installed the Perl XML-XPath package from search.cpan.org. Once it's installed, from bash,... (4 Replies)
Discussion started by: jfmorales
4 Replies

8. Shell Programming and Scripting

XML Parsing

Hi, Need a script to parse the following xml file content <tag1 Name="val1"> <abc Name="key"/> <abc Name="pass">*********</abc> </tag1> <tag2 Name="Core"> <Host Name="a.b.c"> <tag1 Name="abc"> <abc Name="ac">None</abc> ... (4 Replies)
Discussion started by: Mavericc
4 Replies

9. Shell Programming and Scripting

Perl parsing compared to Ksh parsing

#! /usr/local/bin/perl -w $ip = "$ARGV"; $rw = "$ARGV"; $snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw"; $snmpw = "/usr/local/bin/snmpwalk -Os -c $rw"; $syst=`$snmpg $ip system sysName sysObjectID`; sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies

10. UNIX for Advanced & Expert Users

xml parsing error in perl

******************PERL VERSION************************ This is perl, v5.8.1 built for i386-linux-thread-multi ERROR!!!!---Undefined subroutine &main::start called at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser/Expat.pm line 469. *********************PERL... (1 Reply)
Discussion started by: bishweshwar
1 Replies
Login or Register to Ask a Question