Visit Our UNIX and Linux User Community


Parse XML file into CSV with shell?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse XML file into CSV with shell?
# 1  
Old 11-25-2008
Parse XML file into CSV with shell?

Hi,

It's been a few years since college when I did stuff like this all the time. Can someone help me figure out how to best tackle this problem? I need to parse a file full of entries that look like this:

<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-7 B" issuShortDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbol="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" Scurr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym="" streetSym="" mostLiquid="0" />

And I want the data in a csv file with the following columns:

issuerName (symbol-exch) | symbol | exch | curr | Csymbol | Cexch | Ccurr

I only want the data that's in each of these fields, so I want PGR, not symbol="PGR"

I can use sed to strip away everything but the data I need -- which I've done -- but the data remains in its original order, not the one I'm looking for: (Note, the issuerName field is in Brackets for visual purposes).

PGR CA VEF [PROAGROI-7 B] PGR CA VEF

What's the best way to re-order the above line according to my CSV needs? Or is there a different approach I should be taking entirely?
# 2  
Old 11-25-2008
Instead of doing it in shell (sed/awk), better use any XML parser. May be you can write a simple script in Perl or any scripting languages which support XML parsing.
# 3  
Old 11-26-2008
Here is an example of how to do it using xsltproc. Suppose your XML document (file.xml) contains 2 records i.e.
Code:
<?xml version = "1.0"?>
<root>
<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-7 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
<eq action="A" sectyType="0" symbol="PGR" exch="BB" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-8 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
</root>

and you have an XSL stylesheet called file.xsl (deliberately simplified) which contains
Code:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:apply-templates select="/root/eq"/>
</xsl:template>

<!-- write out comma separated file -->
<xsl:template match="/root/eq">
   <xsl:value-of select="@issuerName"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@symbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@exch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@curr"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Csymbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Cexch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Ccurr"/>
   <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

Using xsltproc to transform the document produces the required output
Code:
$ xsltproc file.xsl file.xml
PROAGROI-7 B,PGR,CA,VEF,PGR,CA,VEF
PROAGROI-8 B,PGR,BB,VEF,PGR,CA,VEF

# 4  
Old 12-02-2008
Thanks! I'll need a bit of time to work with this, but I prefer using the right tool for the job and this looks like it will help me with a few next steps I was planning anyways.
# 5  
Old 12-02-2008
hi,

basically use xml parse tools in perl is easy.

Anyway, below boring code can also address the requirement.

You amy try it

Code:
open FH,"<a.txt";
my @arr=<FH>;
close FH;
foreach(@arr){
	while(m/ (.*?=".*?")/){
		my $str=$1;
		$_=$';
		$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
	}
	print $hash{issuerName},"|",$hash{symbol},"|",$hash{exch},"|",$hash{curr},"|",$hash{Csymbol},"|",$hash{Cexch},"|",$hash{Ccurr},"\n";
}

# 6  
Old 04-14-2009
Hi,

I have tried the above code for the following xml
<Account id='xxxxxxxxxxxxxx' name='xxxx' creator='abcd' createDate='110908'
lastModifier='abcd' resource='DataMart' accountId='F100206'
userid='F100206' situation='active' discoveredSituation='CONFIRMED' accountExists='true'>
<MemberObjectGroups>
<ObjectRef type='ObjectGroup' id='#ID#Top' name='Top'/>
</MemberObjectGroups>
</Account>


open FH,"<a.txt";
my @arr=<FH>;
close FH;
foreach(@arr){
while(m/ (.*?=".*?")/){
my $str=$1;
$_=$';
$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
}
print $hash{accountId},"|",$hash{createDate},"|",$hash{userid},"|",$hash{creator},"|",$hash{accountExists} ,"|",$hash{resource},"|",$hash{lastModifier},"\n";
}


I got empty output
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||

Can you please explain what this peice of code does and where i am going wrong?

while(m/ (.*?=".*?")/){
my $str=$1;
$_=$';
$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
}


Thanks in advance for your help
# 7  
Old 04-14-2009
you can use Perl and XML::Simple.
Code:
use XML::Simple;
use Data::Dumper;
my $config = XMLin("file");
print Dumper($config);
my $issuername = $config->{issuerName};
my $symbol =  $config->{issuerName};
my $exch =  $config->{exch};
my $curr = $config->{curr};
my $csymbol = $config->{Csymbol};
my $cexch = $config->{Cexch};
my $ccurr = $config->{Ccurr};
@line = ($symbol,$exch,$curr,$csymbol,$cexch,$ccurr );
print join(",",@line);

output
Code:
# ./test.pl
PROAGROI-7 B,CA,VEF,PGR,CA,VEF

or the "hard way"
Code:
while (<>){
 if ( /<eq/ .. /\/>/ ){
     @list = split /\"\s/ ,$_;
     foreach my $k (@list){
       print "$k\n";
       # get your values;
     }
 }
}


Previous Thread | Next Thread
Test Your Knowledge in Computers #616
Difficulty: Medium
There is no separate char data type in Python.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass some data from csv to xml file using shell/python

Hello gurus, I have a csv file with bunch of datas in each column. (see attached) Now I have an .xml file in the structure of below: ?xml version="1.0" ?> <component id="root" name="root"> <component id="system" name="system"> <param name="number_of_A" value="8"/> ... (5 Replies)
Discussion started by: Zam_1234
5 Replies

2. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

3. Shell Programming and Scripting

BASH script to parse XML and generate CSV

Hi All, Hope all you are doing good! Need your help. I have an XML file which needs to be converted CSV file. I am not an expert of awk/sed so your help is highly appreciated!! XML file looks like this: <l:event dateTime="2013-03-13 07:15:54.713" layerName="OSB" processName="ABC"... (2 Replies)
Discussion started by: bhaskar_m
2 Replies

4. UNIX for Dummies Questions & Answers

Help to parse csv file with shell script

Hello ! I am very aware that this is not the first time this question is asked here, because I have already read a lot of previous answers, but none of them worked, so... As said in the title, I want to read a csv file with a bash script. Here is a sample of the file: ... (4 Replies)
Discussion started by: Grhyll
4 Replies

5. Shell Programming and Scripting

Extract and parse XML data (statistic value) to csv

Hi All, I need to parse some statistic data from the "measInfo" -eg. 25250000 (as highlighted) and return the result into line by line, and erasing all other unnecessary info/tag. Thought of starting with grep "measInfoID="25250000" but this only returns 1 line. How do I get all the output... (8 Replies)
Discussion started by: jackma
8 Replies

6. Shell Programming and Scripting

Korn shell program to parse CSV text file and insert values into Oracle database

Enclosed is comma separated text file. I need to write a korn shell program that will parse the text file and insert the values into Oracle database. I need to write the korn shell program on Red Hat Enterprise Linux server. Oracle database is 10g. (15 Replies)
Discussion started by: shellguy
15 Replies

7. Shell Programming and Scripting

Parse XML file in shell script

Hi Everybody, I have an XML file containing some data and i want to extract it, but the specific issue in my file is that the data is repeated some times like the following example : <section1> <subsection1> X=... Y=... Z=... <\subsection1> <subsection2> X=... Y=... Z=...... (2 Replies)
Discussion started by: yassine
2 Replies

8. Shell Programming and Scripting

regex/shell script to Parse through XML Records

Hi All, I have been working on something that doesn't seem to have a clear regex solution and I just wanted to run it by everyone to see if I could get some insight into the method of solving this problem. I have a flat text file that contains billing records for users, however the records... (5 Replies)
Discussion started by: Jerrad
5 Replies

9. Shell Programming and Scripting

Parse a string in XML file using shell script

Hi! I'm just new here and don't know much about shell scripting. I just want to ask for help in creating a shell script that will parse a string or value of the status in the xml file. Please sample xml file below. Can you please help me create a simple script to get the value of status? Also it... (46 Replies)
Discussion started by: ayhanne
46 Replies

10. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies

Featured Tech Videos