Parse XML file into CSV with shell?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse XML file into CSV with shell?
# 1  
Old 11-25-2008
Parse XML file into CSV with shell?

Hi,

It's been a few years since college when I did stuff like this all the time. Can someone help me figure out how to best tackle this problem? I need to parse a file full of entries that look like this:

<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-7 B" issuShortDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbol="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" Scurr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym="" streetSym="" mostLiquid="0" />

And I want the data in a csv file with the following columns:

issuerName (symbol-exch) | symbol | exch | curr | Csymbol | Cexch | Ccurr

I only want the data that's in each of these fields, so I want PGR, not symbol="PGR"

I can use sed to strip away everything but the data I need -- which I've done -- but the data remains in its original order, not the one I'm looking for: (Note, the issuerName field is in Brackets for visual purposes).

PGR CA VEF [PROAGROI-7 B] PGR CA VEF

What's the best way to re-order the above line according to my CSV needs? Or is there a different approach I should be taking entirely?
# 2  
Old 11-25-2008
Instead of doing it in shell (sed/awk), better use any XML parser. May be you can write a simple script in Perl or any scripting languages which support XML parsing.
# 3  
Old 11-26-2008
Here is an example of how to do it using xsltproc. Suppose your XML document (file.xml) contains 2 records i.e.
Code:
<?xml version = "1.0"?>
<root>
<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-7 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
<eq action="A" sectyType="0" symbol="PGR" exch="BB" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-8 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
</root>

and you have an XSL stylesheet called file.xsl (deliberately simplified) which contains
Code:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:apply-templates select="/root/eq"/>
</xsl:template>

<!-- write out comma separated file -->
<xsl:template match="/root/eq">
   <xsl:value-of select="@issuerName"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@symbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@exch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@curr"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Csymbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Cexch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Ccurr"/>
   <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

Using xsltproc to transform the document produces the required output
Code:
$ xsltproc file.xsl file.xml
PROAGROI-7 B,PGR,CA,VEF,PGR,CA,VEF
PROAGROI-8 B,PGR,BB,VEF,PGR,CA,VEF

# 4  
Old 12-02-2008
Thanks! I'll need a bit of time to work with this, but I prefer using the right tool for the job and this looks like it will help me with a few next steps I was planning anyways.
# 5  
Old 12-02-2008
hi,

basically use xml parse tools in perl is easy.

Anyway, below boring code can also address the requirement.

You amy try it

Code:
open FH,"<a.txt";
my @arr=<FH>;
close FH;
foreach(@arr){
	while(m/ (.*?=".*?")/){
		my $str=$1;
		$_=$';
		$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
	}
	print $hash{issuerName},"|",$hash{symbol},"|",$hash{exch},"|",$hash{curr},"|",$hash{Csymbol},"|",$hash{Cexch},"|",$hash{Ccurr},"\n";
}

# 6  
Old 04-14-2009
Hi,

I have tried the above code for the following xml
<Account id='xxxxxxxxxxxxxx' name='xxxx' creator='abcd' createDate='110908'
lastModifier='abcd' resource='DataMart' accountId='F100206'
userid='F100206' situation='active' discoveredSituation='CONFIRMED' accountExists='true'>
<MemberObjectGroups>
<ObjectRef type='ObjectGroup' id='#ID#Top' name='Top'/>
</MemberObjectGroups>
</Account>


open FH,"<a.txt";
my @arr=<FH>;
close FH;
foreach(@arr){
while(m/ (.*?=".*?")/){
my $str=$1;
$_=$';
$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
}
print $hash{accountId},"|",$hash{createDate},"|",$hash{userid},"|",$hash{creator},"|",$hash{accountExists} ,"|",$hash{resource},"|",$hash{lastModifier},"\n";
}


I got empty output
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||
||||||

Can you please explain what this peice of code does and where i am going wrong?

while(m/ (.*?=".*?")/){
my $str=$1;
$_=$';
$hash{$1}=$2 if ($str=~m/(.*)="(.*)"/);
}


Thanks in advance for your help
# 7  
Old 04-14-2009
you can use Perl and XML::Simple.
Code:
use XML::Simple;
use Data::Dumper;
my $config = XMLin("file");
print Dumper($config);
my $issuername = $config->{issuerName};
my $symbol =  $config->{issuerName};
my $exch =  $config->{exch};
my $curr = $config->{curr};
my $csymbol = $config->{Csymbol};
my $cexch = $config->{Cexch};
my $ccurr = $config->{Ccurr};
@line = ($symbol,$exch,$curr,$csymbol,$cexch,$ccurr );
print join(",",@line);

output
Code:
# ./test.pl
PROAGROI-7 B,CA,VEF,PGR,CA,VEF

or the "hard way"
Code:
while (<>){
 if ( /<eq/ .. /\/>/ ){
     @list = split /\"\s/ ,$_;
     foreach my $k (@list){
       print "$k\n";
       # get your values;
     }
 }
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass some data from csv to xml file using shell/python

Hello gurus, I have a csv file with bunch of datas in each column. (see attached) Now I have an .xml file in the structure of below: ?xml version="1.0" ?> <component id="root" name="root"> <component id="system" name="system"> <param name="number_of_A" value="8"/> ... (5 Replies)
Discussion started by: Zam_1234
5 Replies

2. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

3. Shell Programming and Scripting

BASH script to parse XML and generate CSV

Hi All, Hope all you are doing good! Need your help. I have an XML file which needs to be converted CSV file. I am not an expert of awk/sed so your help is highly appreciated!! XML file looks like this: <l:event dateTime="2013-03-13 07:15:54.713" layerName="OSB" processName="ABC"... (2 Replies)
Discussion started by: bhaskar_m
2 Replies

4. UNIX for Dummies Questions & Answers

Help to parse csv file with shell script

Hello ! I am very aware that this is not the first time this question is asked here, because I have already read a lot of previous answers, but none of them worked, so... As said in the title, I want to read a csv file with a bash script. Here is a sample of the file: ... (4 Replies)
Discussion started by: Grhyll
4 Replies

5. Shell Programming and Scripting

Extract and parse XML data (statistic value) to csv

Hi All, I need to parse some statistic data from the "measInfo" -eg. 25250000 (as highlighted) and return the result into line by line, and erasing all other unnecessary info/tag. Thought of starting with grep "measInfoID="25250000" but this only returns 1 line. How do I get all the output... (8 Replies)
Discussion started by: jackma
8 Replies

6. Shell Programming and Scripting

Korn shell program to parse CSV text file and insert values into Oracle database

Enclosed is comma separated text file. I need to write a korn shell program that will parse the text file and insert the values into Oracle database. I need to write the korn shell program on Red Hat Enterprise Linux server. Oracle database is 10g. (15 Replies)
Discussion started by: shellguy
15 Replies

7. Shell Programming and Scripting

Parse XML file in shell script

Hi Everybody, I have an XML file containing some data and i want to extract it, but the specific issue in my file is that the data is repeated some times like the following example : <section1> <subsection1> X=... Y=... Z=... <\subsection1> <subsection2> X=... Y=... Z=...... (2 Replies)
Discussion started by: yassine
2 Replies

8. Shell Programming and Scripting

regex/shell script to Parse through XML Records

Hi All, I have been working on something that doesn't seem to have a clear regex solution and I just wanted to run it by everyone to see if I could get some insight into the method of solving this problem. I have a flat text file that contains billing records for users, however the records... (5 Replies)
Discussion started by: Jerrad
5 Replies

9. Shell Programming and Scripting

Parse a string in XML file using shell script

Hi! I'm just new here and don't know much about shell scripting. I just want to ask for help in creating a shell script that will parse a string or value of the status in the xml file. Please sample xml file below. Can you please help me create a simple script to get the value of status? Also it... (46 Replies)
Discussion started by: ayhanne
46 Replies

10. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies
Login or Register to Ask a Question