Parsing XML file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Parsing XML file
# 8  
Old 06-03-2014
there are xml files similar to the sample code. I am trying to parse these files one by one to get certain content in a tabular format. If you look at the xml sample, all these below information are there in the sample along with some others. hope I have clarified my requirement.

The output should be as below

Code:
 This is important,tablename1,col1,colname,
This is important,tablename1,col2,colname1,
This is important,tablename2,col1,colname2,
This is important,tablename3,col1,colname3,
This is also important,tablename4,col1,colname,
This is also important,tablename4,col2,colname1,
This is also important,tablename5,col1,colname2,
This is also important,tablename5,col1,colname3

---------- Post updated at 09:44 AM ---------- Previous update was at 09:39 AM ----------

this piece if code is almost working except there are lots of spaces coming in the output and also there are rows where there is just vale in 1st column but rest columns are blanks.

anyway I am trying to remove them


thanks you very much.

Last edited by Don Cragun; 06-03-2014 at 04:26 PM.. Reason: Add CODE tags.
# 9  
Old 06-03-2014
Quote:
Originally Posted by ms2001
... ... ...

this piece if code is almost working except there are lots of spaces coming in the output and also there are rows where there is just vale in 1st column but rest columns are blanks.

anyway I am trying to remove them


thanks you very much.
If you would explicitly state the rules that define the format of the fields you're trying to parse (instead of just giving us a single sample XML file with no description of what you're trying to do), we might be able to help debug your code.

Why is there a space at the start of the 1st line in your output? (It doesn't seem to be present in your XML file.)

Why is there a comma at the end of the output line:
Code:
This is important,tablename3,col1,colname3,

when there is no comma there in your XML file?

In addition to lines with the formats:
Code:
 tablename2.col1 as colname2,
coalesce(tablename3.col1,0) as colname3
        and
 (tablename5.col1*10) as colname3

what other formats might appear from which you want to extract tablenamedigits, coldigits, and colnamedigitsOptionalPunctuation?
# 10  
Old 06-04-2014
I have tried to explain the whole scenario again all possible codes etc.
This is my code sample
Code:
<name locale="en">my_name<>/name><lastChanged>somedate</lastChanged><some more code here>
<name locale="en">tablename1<>/name><lastChanged>somedate</lastChanged>
<definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><lastChanged>somedate</lastChanged><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><lastChanged>somedate</lastChanged><abc>bbbgbssx</<abc>
<name locale="en">tablename2<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename2</sql><lastChanged>somedate</lastChanged><somemorecode here>
<name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><abc>bbbgbssx</<abc>
<name locale="en">tablename3<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><name locale="en">col3<>/name><abc>bbbgbssx</<abc><usage>attribute</usage><datatype>char</datatype><collectionSequenceName>en</collectionSequenceName><collectionSequenceLevel>1</collectionSequenceLevel><querySubject status="sometext"><name locale="en">This is important</name><lastChanged>somedate</lastChanged><definition><modelQuery><sql type=cognos">select 
  tablename1.col1 as colname,
  tablename1.col2 as colname1,
  tablename2.col1 as colname2,
  coalesce(tablename3.col1,0) as colname3
 from 
tablename1 join
tablename2
join
tablename3</sql></modelQuery></definition><lastChanged>somedate</lastChanged><name locale="en">tablename4<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><lastChanged>somedate</lastChanged><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><lastChanged>somedate</lastChanged><abc>bbbgbssx</<abc><name locale="en">tablename5<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename2</sql><lastChanged>somedate</lastChanged><somemorecode here>
 <name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><abc>bbbgbssx</<abc>
 <name locale="en">tablename6<>/name><lastChanged>somedate</lastChanged>
 <definition><dbquery><sources><sql type="cognos">select * from tablename4</sql><somemorecode here>
 <name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><name locale="en">col3<>/name><abc>bbbgbssx</<abc><usage>attribute</usage><datatype>char</datatype><collectionSequenceName>en</collectionSequenceName><collectionSequenceLevel>1</collectionSequenceLevel><querySubject status="sometext"><name locale="en">This is also important</name><lastChanged>somedate</lastChanged><definition><modelQuery><sql type=cognos">select 
 tablename4.col1 as colname,
 tablename4.col2 as colname1,
 tablename5.col1 as colname2,
 (tablename5.col1*10) as colname3
 from 
tablename4 join
tablename4
</sql></modelQuery></definition><lastChanged>somedate</lastChanged>
<name locale="en">my_name<>/name><lastChanged>somedate</lastChanged><some more code here>
<name locale="en">tablename1<>/name><lastChanged>somedate</lastChanged>
<definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><lastChanged>somedate</lastChanged><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><lastChanged>somedate</lastChanged><abc>bbbgbssx</<abc>
<name locale="en">tablename2<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename2</sql><lastChanged>somedate</lastChanged><somemorecode here>
<name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><abc>bbbgbssx</<abc>
<name locale="en">tablename3<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><name locale="en">col3<>/name><abc>bbbgbssx</<abc><usage>attribute</usage><datatype>char</datatype><collectionSequenceName>en</collectionSequenceName><collectionSequenceLevel>1</collectionSequenceLevel><querySubject status="sometext"><name locale="en">This is important</name><lastChanged>somedate</lastChanged><definition><modelQuery><sql type=cognos">select 
  tablename1.col1 as colname,
  tablename1.col2 as colname1,
  tablename2.col1 as colname2,
  coalesce(tablename3.col1,0) as colname3
 from 
tablename1 join
tablename2
join
tablename3</sql></modelQuery></definition><lastChanged>somedate</lastChanged><name locale="en">tablename4<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename1</sql><lastChanged>somedate</lastChanged><somemorecode here><name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><lastChanged>somedate</lastChanged><abc>bbbgbssx</<abc><name locale="en">tablename5<>/name><lastChanged>somedate</lastChanged><definition><dbquery><sources><sql type="cognos">select * from tablename2</sql><lastChanged>somedate</lastChanged><somemorecode here>
 <name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><lastChanged>somedate</lastChanged><name locale="en">col3<>/name><abc>bbbgbssx</<abc>
 <name locale="en">tablename6<>/name><lastChanged>somedate</lastChanged>
 <definition><dbquery><sources><sql type="cognos">select * from tablename4</sql><somemorecode here>
 <name locale="en">col1<>/name><lastChanged>somedate</lastChanged><abc>bbbbssx</<abc><name locale="en">col2<>/name><name locale="en">col3<>/name><abc>bbbgbssx</<abc><usage>attribute</usage><datatype>char</datatype><collectionSequenceName>en</collectionSequenceName><collectionSequenceLevel>1</collectionSequenceLevel><querySubject status="sometext"><name locale="en">This is also important</name><lastChanged>somedate</lastChanged><definition><modelQuery><sql type=cognos">select 
 tablename4.col1 as colname,
 tablename4.col2 as colname1,
 tablename5.col1 as colname2,
 (tablename5.col1*10) as colname3
 from 
tablename4 join
tablename4
</sql></modelQuery></definition><lastChanged>somedate</lastChanged>
<some more here similar to this>
<some more here similar to this>
<some more here similar to this>
<some more here similar to this>
<some more here similar to this>

there are N number of similar lines within the same pattern in a single xml file with some other unnecessary text.
So first I have to extract these blocks into a single file and the finally
I need output like below into a file redirected
Code:
This is important,tablename1,col1,colname
This is important,tablename1,col2,colname1
This is important,tablename2,col1,colname2
This is important,tablename3,col1,colname3
This is also important,tablename4,col1,colname
This is also important,tablename4,col2,colname1
This is also important,tablename5,col1,colname2
This is also important,tablename5,col1,colname3


This code posted by pilnet101 is almost working
Code:
awk '/>select[[:space:]]*$/{f++};/^from/{f && f--}f' xmlfile|awk -F"[<>]" '{a=$0};{if (a ~ "select") {c=$(NF-12)","}};{gsub(/[\.]/,",");gsub(/ as /,",");gsub(/,$/,"");{gsub(/.*\(/,"");gsub(/[[:punct:]][0-9]*\)/,"")};{if ($0~",") print c$0}}'

except the fact that getting lots of spaces, special characters with output,
# 11  
Old 06-04-2014
Quote:
Originally Posted by ms2001
I have tried to explain the whole scenario again all possible codes etc.
This is my code sample
Code:
sample XML code deleted for brevity

there are N number of similar lines within the same pattern in a single xml file with some other unnecessary text.
So first I have to extract these blocks into a single file and the finally
I need output like below into a file redirected
Code:
This is important,tablename1,col1,colname
This is important,tablename1,col2,colname1
This is important,tablename2,col1,colname2
This is important,tablename3,col1,colname3
This is also important,tablename4,col1,colname
This is also important,tablename4,col2,colname1
This is also important,tablename5,col1,colname2
This is also important,tablename5,col1,colname3


This code posted by pilnet101 is almost working
Code:
awk '/>select[[:space:]]*$/{f++};/^from/{f && f--}f' xmlfile|awk -F"[<>]" '{a=$0};{if (a ~ "select") {c=$(NF-12)","}};{gsub(/[\.]/,",");gsub(/ as /,",");gsub(/,$/,"");{gsub(/.*\(/,"");gsub(/[[:punct:]][0-9]*\)/,"")};{if ($0~",") print c$0}}'

except the fact that getting lots of spaces, special characters with output,
When I run pilnet101's code with your sample input, I get exactly the output you showed us above.
There are no spaces in that output except the ones between the words This, is, also (in some lines), and importantas shown in your desired output.
And the only "special characters" in the output are the <newline> characters that terminate the output lines.

What operating system are you using?

Show us the output you're getting. And, show us the output from piping that output through the command:
Code:
od -c

so we can see the extra spaces and special characters.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with parsing xml file

Hi, Need help with parsing xml data in unix and place it in a csv file. My xml file looks like this: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <iwgroups> <nextid value="128"> </nextid> <iwgroup name="RXapproval" id="124" display-name="RXapproval"... (11 Replies)
Discussion started by: ajayakunuri
11 Replies

2. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

3. Shell Programming and Scripting

Help in parsing XML output file in perl.

Hi I have an XML output like : <?xml version="1.0" encoding="ISO-8859-1" ?> - <envelope> - <body> - <outputGetUsageSummary> - <usgSumm rerateDone="5"> - <usageAccum accumId="269" accumCaptn="VD_DP_AR" inclUnits="9999999.00" inclUnitsUsed="0.00" shared="false" pooled="false"... (7 Replies)
Discussion started by: rkrish
7 Replies

4. Shell Programming and Scripting

Parsing an XML file

Hello, I have the following xml file as an input. <?xml version="1.0" encoding="UTF-8"?> <RECORDS PS3_VERSION="1104_01"><RECORD> <POI_ID>931</POI_ID> <SUPPLIER_ID>2</SUPPLIER_ID> <POI_PVID>997920846</POI_PVID> <DB_ID>1366650925</DB_ID> <REGION>H1</REGION> <POI_NAME NAME_TYPE="Official"... (4 Replies)
Discussion started by: ramky79
4 Replies

5. Shell Programming and Scripting

parsing xml file

Hello! We need to parse weblogic config.xml file and display rows in format: machine:listen-port:name:application_name In our enviroment the output should be (one line for every instance): Crm-Test-Web:8001:PIA:peoplesoft Crm-Test-Web:8011:PIA:peoplesoft... (9 Replies)
Discussion started by: annar
9 Replies

6. Shell Programming and Scripting

Help in parsing xml file (sed/nawk)

I have a large xml file as shown below: <input> <blah> <blah> <atr="blah blah value = ""> <blah> <blah> </input> ..2nd chunk... ..3rd chunk... ...4th chunk... All lines between <input> and </input> is one 'order' and this 'order' is repeated... (14 Replies)
Discussion started by: shekhar2010us
14 Replies

7. Shell Programming and Scripting

Parsing xml file

hi guys, great help to the original question, can i expand please? i have large files filled with blocks like this <Placemark> network type: hot line1 line2 line3 <styleUrl>red.png</styleUrl> </Placemark> <Placemark> network type: cold line1 line2 line3... (3 Replies)
Discussion started by: garvald
3 Replies

8. UNIX for Dummies Questions & Answers

Help parsing a XML file ....

Well I have read several threads on the subject ... but being a newbie like me makes it hard to understand ... What I need is the following: Input data: ------- snip --------- <FavouriteLocations> <FavouriteLocations class="FavouriteList"><Item... (6 Replies)
Discussion started by: misak
6 Replies

9. Shell Programming and Scripting

XML file parsing using script

Hi I need some help with XML file parsing. I have an XML file with the below tag, I need a script to identify the value of srvcName which is this case is "AAA srvc name". I need to put contents of this value which is AAA srvc and name into different variables using an array and then reformat it... (6 Replies)
Discussion started by: zmfcat1
6 Replies

10. UNIX for Advanced & Expert Users

Parsing xml file using Sed

Hi All, I have this(.xml) file as: <!-- define your instance here --> <instance name='ins_C2Londondev' user='' group='' fullname='B2%20-%20London%20(dev)' > <property> </property> </instance> I want output as: <!-- define your instance here --> <instance... (3 Replies)
Discussion started by: kapilkinha
3 Replies
Login or Register to Ask a Question