Help parsing a XML file ....


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help parsing a XML file ....
# 1  
Old 02-25-2010
Help parsing a XML file ....

Well I have read several threads on the subject ... but being a newbie like me makes it hard to understand ...

What I need is the following:

Input data:

------- snip ---------

<FavouriteLocations> <FavouriteLocations class="FavouriteList"><Item
class="Favourite"><DisplayName>Hem</DisplayName><Behaviour>0</Behaviour><LastUsed>2</LastUsed><StickyOrder>0</StickyOrder><Object
class="Location"><name>41837</name><name2></name2><lat>5772743</lat><long>1188701</long><entryName>A Street</entryName><entryLat>5772743</entryLat><entryLong>1188701</entryLong><entryHeading>-1</entryHeading><entrySideOfLine>0</entrySideOfLine><entryHouseNo>5</entryHouseNo><entryPostcode>41837</entryPostcode><entryResolved>1</entryResolved><entryValid>1</entryValid><entryMaxRoad>0</entryMaxRoad><entryHalfSize>0</entryHalfSize><source>5</source><resultType>6</resultType><photo></photo><typeName></typeName><phoneNum></phoneNum><description></description><type>road</type></Object></Item>

------ snip-------

The file has several of these items in them ...

And what I need to extract is the <lat>, <long> and <entryName> data ...on one row.

If possible I would prefer this in awk ... since I have just started to understand the basics of this ...

Hope somebody can help me out !!!!

Thanks in advance !

/misak
# 2  
Old 02-25-2010
Hi
This isn't elegant, but I hope this will be helpful.
I've copied the fragment to a file named test_file. (Since I don't know a thing about XML, I guessed that lat is the field between the tags <lat> and </lat> and so on)
Then I did the following

cat test_file|while read line
do
lat=$(echo $line|nawk -F"<lat>" '{print $2}'|nawk -F"</lat>" '{print $1}')
long=$(echo $line|nawk -F"<long>" '{print $2}'|nawk -F"</long>" '{print $1}')
entryName=$(echo $line|nawk -F"<entryName>" '{print $2}'|nawk -F"</entryName>" '{print $1}')
echo "$lat $long $entryName"
done

The result is the following:





5772743 1188701 A Street



As you can see, there are blank lines, but I guess this isn't a real problem.
In any case, I'm sure that is possible to solve it in a elegant way.
Regards.
# 3  
Old 02-25-2010
Almost worked

Thanks for the reply Fran

The problem is that if there are more items in the file ... it only grabs the first one ... maybe because there is no NewLine ...

I will post a larger snip of the file ....

---- snip ----

<FavouriteLocations> <FavouriteLocations class="FavouriteList"><Item
class="Favourite"><DisplayName>Hem</DisplayName><Behaviour>0</Behaviour><LastUsed>2</LastUsed><StickyOrder>0</StickyOrder><Object
class="Location"><name>41837</name><name2></name2><lat>5772743</lat><long>1188701</long><entryName>Dimvädersgatan</entryName><entryLat>5772743</entryLat><entryLong>1188701</entryLong><entryHeading>-1</entryHeading><entrySideOfLine>0</entrySideOfLine><entryHouseNo>5</entryHouseNo><entryPostcode>41837</entryPostcode><entryResolved>1</entryResolved><entryValid>1</entryValid><entryMaxRoad>0</entryMaxRoad><entryHalfSize>0</entryHalfSize><source>5</source><resultType>6</resultType><photo></photo><typeName></typeName><phoneNum></phoneNum><description></description><type>road</type></Object></Item><Item
class="Favourite"><DisplayName>Färdvägsplanerare</DisplayName><Behaviour>0</Behaviour><LastUsed>3</LastUsed><StickyOrder>1</StickyOrder></Item><Item
class="Favourite"><DisplayName>Dimvädersgatan 5</DisplayName><Behaviour>1</Behaviour><LastUsed>1</LastUsed><StickyOrder>-1</StickyOrder><Object
class="Location"><name>41837</name><name2></name2><lat>5772743</lat><long>1188701</long><entryName>Dimvädersgatan</entryName><entryLat>5772743</entryLat><entryLong>1188701</entryLong><entryHeading>-1</entryHeading><entrySideOfLine>0</entrySideOfLine><entryHouseNo>5</entryHouseNo><entryPostcode>41837</entryPostcode><entryResolved>1</entryResolved><entryValid>1</entryValid><entryMaxRoad>0</entryMaxRoad><entryHalfSize>0</entryHalfSize><source>5</source><resultType>6</resultType><photo></photo><typeName></typeName><phoneNum></phoneNum><description></description><type>road</type></Object></Item><Item
class="Favourite"><DisplayName>Uttagsautomat</DisplayName><Behaviour>1</Behaviour><LastUsed>0</LastUsed><StickyOrder>-1</StickyOrder><Object
class="Location"><name>Uttagsautomat</name><name2></name2><lat>5772164</lat><long>1193476</long><entryName>Wieselgrensplatsen</entryName><entryLat>5772163</entryLat><entryLong>1193475</entryLong><entryHeading>67</entryHeading><entrySideOfLine>-1</entrySideOfLine><entryHouseNo>0</entryHouseNo><entryPostcode></entryPostcode><entryResolved>1</entryResolved><entryValid>1</entryValid><entryMaxRoad>0</entryMaxRoad><entryHalfSize>0</entryHalfSize><source>5</source><resultType>11</resultType><photo></photo><typeName>Bankomat</typeName><phoneNum></phoneNum><description></description><type>poi</type></Object></Item></FavouriteLocations></FavouriteLocations>

------- snip -------


Hope it helps a bit ... and thanks for all the help here !

/misak
# 4  
Old 02-25-2010
So, are you saying that the text is an uniq line with several concurrences of <lat>?
# 5  
Old 02-25-2010
yeah it seems that way ... huuu

/misak
# 6  
Old 02-25-2010
While you can use awk and other command line utilities to parse your file and extract the data you require, a better way is to use tools which are designed to work with XML documents.

For the purposes of this example, I have simplified the structure of your document down to the following elements (demo.xml):
Code:
<FavouriteLocations>
   <FavouriteLocations class="FavouriteList">
       <Item class="Favourite">
          <Object class="Location">
              <lat>LAT1</lat>
              <long>LONG1</long>
              <entryName>NAME1</entryName>
          </Object>
       </Item>
       <Item class="Favourite">
          <Object class="Location">
              <lat>LAT2</lat>
              <long>LONG2</long>
              <entryName>NAME2</entryName>
          </Object>
       </Item>
       <Item class="Favourite">
          <Object class="Location">
              <lat>LAT3</lat>
              <long>LONG3</long>
              <entryName>NAME3</entryName>
          </Object>
       </Item>
    </FavouriteLocations>
</FavouriteLocations>

The best tool to extract the data you want from this XML document is a stylesheet transformation processor such as xsltproc or saxon. Here is a simple stylesheet (demo.xsl) which extracts values of the lat, long and entryName elements.
Code:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text"/>

<xsl:template match="/">
   <xsl:apply-templates select="//Object" />
</xsl:template>

<xsl:template match="//Object">
    <xsl:value-of select="./lat"/>
    <xsl:text>   </xsl:text>
    <xsl:value-of select="./long"/>
    <xsl:text>   </xsl:text>
    <xsl:value-of select="./entryName"/>
    <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

Here is the output from transforming the document using xsltrpoc.
Code:
$ xsltproc demo.xsl demo.xml
LAT1   LONG1   NAME1
LAT2   LONG2   NAME2
LAT3   LONG3   NAME3
$

# 7  
Old 02-26-2010
Problem solved ...
Thanks a million people ... for the responses
I went with the xsltproc solution ... works fine

Take care !

/misak
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with parsing xml file

Hi, Need help with parsing xml data in unix and place it in a csv file. My xml file looks like this: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <iwgroups> <nextid value="128"> </nextid> <iwgroup name="RXapproval" id="124" display-name="RXapproval"... (11 Replies)
Discussion started by: ajayakunuri
11 Replies

2. UNIX for Dummies Questions & Answers

Parsing XML file

I want to parse xml file sample file....... <name locale="en">my_name<>/name><lastChanged>somedate</lastChanged><some more code here> <name locale="en">tablename1<>/name><lastChanged>somedate</lastChanged> <definition><dbquery><sources><sql type="cognos">select * from... (10 Replies)
Discussion started by: ms2001
10 Replies

3. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

4. Shell Programming and Scripting

Help in parsing XML output file in perl.

Hi I have an XML output like : <?xml version="1.0" encoding="ISO-8859-1" ?> - <envelope> - <body> - <outputGetUsageSummary> - <usgSumm rerateDone="5"> - <usageAccum accumId="269" accumCaptn="VD_DP_AR" inclUnits="9999999.00" inclUnitsUsed="0.00" shared="false" pooled="false"... (7 Replies)
Discussion started by: rkrish
7 Replies

5. Shell Programming and Scripting

Parsing an XML file

Hello, I have the following xml file as an input. <?xml version="1.0" encoding="UTF-8"?> <RECORDS PS3_VERSION="1104_01"><RECORD> <POI_ID>931</POI_ID> <SUPPLIER_ID>2</SUPPLIER_ID> <POI_PVID>997920846</POI_PVID> <DB_ID>1366650925</DB_ID> <REGION>H1</REGION> <POI_NAME NAME_TYPE="Official"... (4 Replies)
Discussion started by: ramky79
4 Replies

6. Shell Programming and Scripting

parsing xml file

Hello! We need to parse weblogic config.xml file and display rows in format: machine:listen-port:name:application_name In our enviroment the output should be (one line for every instance): Crm-Test-Web:8001:PIA:peoplesoft Crm-Test-Web:8011:PIA:peoplesoft... (9 Replies)
Discussion started by: annar
9 Replies

7. Shell Programming and Scripting

Help in parsing xml file (sed/nawk)

I have a large xml file as shown below: <input> <blah> <blah> <atr="blah blah value = ""> <blah> <blah> </input> ..2nd chunk... ..3rd chunk... ...4th chunk... All lines between <input> and </input> is one 'order' and this 'order' is repeated... (14 Replies)
Discussion started by: shekhar2010us
14 Replies

8. Shell Programming and Scripting

Parsing xml file

hi guys, great help to the original question, can i expand please? i have large files filled with blocks like this <Placemark> network type: hot line1 line2 line3 <styleUrl>red.png</styleUrl> </Placemark> <Placemark> network type: cold line1 line2 line3... (3 Replies)
Discussion started by: garvald
3 Replies

9. Shell Programming and Scripting

XML file parsing using script

Hi I need some help with XML file parsing. I have an XML file with the below tag, I need a script to identify the value of srvcName which is this case is "AAA srvc name". I need to put contents of this value which is AAA srvc and name into different variables using an array and then reformat it... (6 Replies)
Discussion started by: zmfcat1
6 Replies

10. UNIX for Advanced & Expert Users

Parsing xml file using Sed

Hi All, I have this(.xml) file as: <!-- define your instance here --> <instance name='ins_C2Londondev' user='' group='' fullname='B2%20-%20London%20(dev)' > <property> </property> </instance> I want output as: <!-- define your instance here --> <instance... (3 Replies)
Discussion started by: kapilkinha
3 Replies
Login or Register to Ask a Question