Needing help parsing XML/RDF using Python


 
Thread Tools Search this Thread
Top Forums Programming Needing help parsing XML/RDF using Python
# 1  
Old 05-18-2010
Needing help parsing XML/RDF using Python

Hello,

I am trying to make script to parse the install.rdf files found in firefox xpi extentions to isolate the extention ID so I can name a directory and automate installation of system-wide extension.

I am very facile with the command line, but not with programming languages (esp object-oriented ones). I have been working on learning python.

I would like to use minidom from xml.dom.

Here are two examples of different formats of install.rdf files:

Code:
<?xml version="1.0"?> 
 
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
     xmlns:em="http://www.mozilla.org/2004/em-rdf#"> 
 
    <Description about="urn:mozilla:install-manifest"> 
        <em:type>2</em:type> 
        <em:id>{4BBDD651-70CF-4821-84F8-2B918CF89CA3}</em:id> 
        <em:name>FEBE</em:name> 
        <em:version>6.3.2</em:version> 
        <em:description>Backup your Firefox data</em:description> 
        <em:creator>Chuck Baker</em:creator> 
        <em:translator>www.babelzilla.org</em:translator> 
        <em:optionsURL>chrome://febe/content/settings/febeOptions.xul</em:optionsURL> 
        <em:iconURL>chrome://febe/skin/febe32x32.png</em:iconURL> 
        <em:homepageURL>http://customsoftwareconsult.com/extensions</em:homepageURL> 
        <em:contributor>Leszek(teo)Życzkowski (pl-PL translation, XUL help, beta tester, options icons)</em:contributor> 
        <em:contributor>menet (fr-FR translation, beta tester)</em:contributor> 
        <em:contributor>Piotr Chyliński (Toolbar icons and display logos)</em:contributor> 
        <!-- Firefox --> 
        <em:targetApplication> 
            <Description> 
                <em:id>{ec8030f7-c20a-464f-9b0e-13a3a9e97384}</em:id> 
                <em:minVersion>3.0</em:minVersion> 
                <em:maxVersion>3.7a1pre</em:maxVersion> 
            </Description> 
        </em:targetApplication> 
    </Description> 
</RDF>

Code:
<?xml version="1.0"?> 
<RDF:RDF xmlns:em="http://www.mozilla.org/2004/em-rdf#" 
         xmlns:NC="http://home.netscape.com/NC-rdf#" 
         xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> 
  <RDF:Description RDF:about="rdf:#$VB9S51" 
                   em:id="{ec8030f7-c20a-464f-9b0e-13a3a9e97384}" 
                   em:minVersion="3.0" 
                   em:maxVersion="3.6.*" /> 
  <Description RDF:about="rdf:#$wfnx83" 
                   em:id="{a463f10c-3994-11da-9945-000d60ca027b}" 
                   em:minVersion="0.4" 
                   em:maxVersion="1.0" /> 
  <RDF:Description RDF:about="urn:mozilla:extension:file:cooliris.jar" 
                   em:package="content/cooliris/" 
                   em:skin="skin/classic/cooliris/" 
                   em:locale="locale/cooliris/en-US/" /> 
  <RDF:Description RDF:about="rdf:#$.B9S51" 
                   em:id="{86c18b42-e466-45a9-ae7a-9b95ba6f5640}" 
                   em:minVersion="1.7" 
                   em:maxVersion="1.8" /> 
  <RDF:Description RDF:about="urn:mozilla:install-manifest" 
                   em:id="{CE6E6E3B-84DD-4cac-9F63-8D2AE4F30A4B}" 
                   em:name="CoolPreviews"                    
                   em:version="3.0.1" 
                   em:description="Browse Faster. Preview and share links and media without leaving your current page." 
                   em:creator="Cooliris Inc, www.cooliris.com" 
                   em:optionsURL="chrome://cooliris/content/options3.xul" 
                   em:iconURL="chrome://cooliris/skin/new/previews-installer-icon.png"> 
    <em:file RDF:resource="urn:mozilla:extension:file:cooliris.jar"/> 
    <em:targetApplication RDF:resource="rdf:#$VB9S51"/> 
    <em:targetApplication RDF:resource="rdf:#$.B9S51"/> 
    <em:targetApplication RDF:resource="rdf:#$wfnx83"/> 
  </RDF:Description> 
</RDF:RDF>

The first code above has em:id as a tag.

I have discovered that if one parses it:

Code:
from xml.dom import minidom
doc = minidom.parse('install.rdf')
print doc.getElementsByTagName("em:id")[0].childNodes[0].data

It will spit out the id. However, I need to automate this and spit out the em:id for the element which also contains the em:name element.

I need help in figuring out how to iterate through this and testing if that node (if that is the right terminology) also contains the em:name tag as that is the em:id that I need (you may notice there is more than one em:id tag)

One the second code section above, it is stranger, as em:id doesn't seem to be in a normal tag and getElementsByTagName doesn't work, let alone the childNodes.data methods.

How does one access the element (if that is what it is called in the rendition) in the second code? It also needs to iterate and test to see if that section has the em:name tag?

I know this is a big complex, but I am stuck. I've not found any tutorials at all for the second code formating.

Adding to the problem, I need to know how to tell the two formats apart to start the parsing. Ideas for that, too?

Your help is greatly appreciated (as is being pointed to where I can also learn more about this).

Yours,
Narnie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

XML parsing

I have an xml file where the format looks like below <SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command"> <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1"> ... (8 Replies)
Discussion started by: r_t_1601
8 Replies

2. Shell Programming and Scripting

XML Parsing :

HI I want to parse below file in to two output :- Input :- ?xml version="1.0" encoding="UTF-8"?> <bulkCmConfigDataFile xmlns:un="utranNrm.xsd" <configData dnPrefix="Undefined"> <xn:SubNetwork id="ONRM_ROOT_MO_R"> <xn:MeContext id="C136"> ... (3 Replies)
Discussion started by: asavaliya
3 Replies

3. Shell Programming and Scripting

XML parsing

i have xml output in below format... <AlertsResponse> <Alert id="11216" name="fgdfg"> <AlertActionLog timestamp="1356521629778" user="admin" detail="Recovery Alert"/> </Alert> <Alert id="11215" name="gdfg <AlertActionLog timestamp="1356430119840" user=""... (12 Replies)
Discussion started by: vivek d r
12 Replies

4. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

5. Programming

Python: Parsing and comparing XMLs with minidom

Hi there! I'd like to parse and compare 2 XML files with the minidom parser as follows: I have 2 XML files with loads of data. One is in English (the source file), the other one the corresponding French translation (the target file). E.g.: source file: <macro> <id> 123</id> ... (0 Replies)
Discussion started by: Bloomy
0 Replies

6. Shell Programming and Scripting

Parsing XML

I am trying to parse an xml file and trying to grab certain values and inserting them into database table. I have the following xml that I am parsing: <dd:service name="locator" link="false"> <dd:activation mode="manual" /> <dd:run mode="direct_persistent" proxified="false" managed="true"... (7 Replies)
Discussion started by: $criptKid617
7 Replies

7. UNIX for Advanced & Expert Users

XML Parsing

I had a big XML and from which I have to make a layout as below *TOTAL+CB | *CB+FX | CS |*IR | *TOTAL | -------------------------------------------------------------------------------------------------- |CB FX | | | | DMFXNY EMSGFX... (6 Replies)
Discussion started by: manas_ranjan
6 Replies

8. UNIX for Dummies Questions & Answers

awk/grep or parsing in python code

Hello, I am writing a python code. The output of the python code needs a little bit of parsing. From the output of python code, which has a lot of redundant data, I need to cut only those words or numbers which end with &. for example: if the output is-- "This is an example of tgbn123& what i... (0 Replies)
Discussion started by: Screech_you
0 Replies

9. Programming

Parsing command line arguments in Python

Hi, I've a python script called aaa.py and passing an command line option " -a" to the script like, ./aaa.py -a & Inside the script if the -a option is given I do some operation if not something else. code looks like ./aaa.py -a . . if options.a ---some operation--- if not options.a... (1 Reply)
Discussion started by: testin
1 Replies

10. Programming

XML parsing

Hi I want to take an XML file and transform it into a pipe-delimited format. What is the best tool to use for this? I have libxml2 which seems to be the best xml parser around. The xml file will have the following format. <Txn> <Date>120504</Date> <id>99</id> <Items> <Item>... (1 Reply)
Discussion started by: handak9
1 Replies
Login or Register to Ask a Question