Sponsored Content
Top Forums Shell Programming and Scripting Parse XML file into CSV with shell? Post 302262136 by fpmurphy on Wednesday 26th of November 2008 12:06:20 PM
Old 11-26-2008
Here is an example of how to do it using xsltproc. Suppose your XML document (file.xml) contains 2 records i.e.
Code:
<?xml version = "1.0"?>
<root>
<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-7 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
<eq action="A" sectyType="0" symbol="PGR" exch="BB" curr="VEF" sess="NORM" dfltInd="1" issuerName="PROAGROI-8 B" issuSho
rtDesc="VEB100" sectySubType="" sedol="2705132" isin="VEV000901000" cusip="" localCode="VEV000901000" localId="5" Csymbo
l="PGR" Cexch="CA" Ccurr="VEF" Csess="NORM" Psymbol="PGR" Pexch="CA" Pcurr="VEF" Psess="NORM" Ssymbol="PGR" Sexch="CA" S
curr="VEF" Ssess="NORM" exclPFInd="0" ranking="" longIssuerName="PROAGRO, C.A." issuLongDesc="VEB100" sicCode="" exchSym
="" streetSym="" mostLiquid="0" />
</root>

and you have an XSL stylesheet called file.xsl (deliberately simplified) which contains
Code:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:apply-templates select="/root/eq"/>
</xsl:template>

<!-- write out comma separated file -->
<xsl:template match="/root/eq">
   <xsl:value-of select="@issuerName"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@symbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@exch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@curr"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Csymbol"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Cexch"/>
   <xsl:value-of select="','"/>
   <xsl:value-of select="@Ccurr"/>
   <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

Using xsltproc to transform the document produces the required output
Code:
$ xsltproc file.xsl file.xml
PROAGROI-7 B,PGR,CA,VEF,PGR,CA,VEF
PROAGROI-8 B,PGR,BB,VEF,PGR,CA,VEF

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies

2. Shell Programming and Scripting

Parse a string in XML file using shell script

Hi! I'm just new here and don't know much about shell scripting. I just want to ask for help in creating a shell script that will parse a string or value of the status in the xml file. Please sample xml file below. Can you please help me create a simple script to get the value of status? Also it... (46 Replies)
Discussion started by: ayhanne
46 Replies

3. Shell Programming and Scripting

regex/shell script to Parse through XML Records

Hi All, I have been working on something that doesn't seem to have a clear regex solution and I just wanted to run it by everyone to see if I could get some insight into the method of solving this problem. I have a flat text file that contains billing records for users, however the records... (5 Replies)
Discussion started by: Jerrad
5 Replies

4. Shell Programming and Scripting

Parse XML file in shell script

Hi Everybody, I have an XML file containing some data and i want to extract it, but the specific issue in my file is that the data is repeated some times like the following example : <section1> <subsection1> X=... Y=... Z=... <\subsection1> <subsection2> X=... Y=... Z=...... (2 Replies)
Discussion started by: yassine
2 Replies

5. Shell Programming and Scripting

Korn shell program to parse CSV text file and insert values into Oracle database

Enclosed is comma separated text file. I need to write a korn shell program that will parse the text file and insert the values into Oracle database. I need to write the korn shell program on Red Hat Enterprise Linux server. Oracle database is 10g. (15 Replies)
Discussion started by: shellguy
15 Replies

6. Shell Programming and Scripting

Extract and parse XML data (statistic value) to csv

Hi All, I need to parse some statistic data from the "measInfo" -eg. 25250000 (as highlighted) and return the result into line by line, and erasing all other unnecessary info/tag. Thought of starting with grep "measInfoID="25250000" but this only returns 1 line. How do I get all the output... (8 Replies)
Discussion started by: jackma
8 Replies

7. UNIX for Dummies Questions & Answers

Help to parse csv file with shell script

Hello ! I am very aware that this is not the first time this question is asked here, because I have already read a lot of previous answers, but none of them worked, so... As said in the title, I want to read a csv file with a bash script. Here is a sample of the file: ... (4 Replies)
Discussion started by: Grhyll
4 Replies

8. Shell Programming and Scripting

BASH script to parse XML and generate CSV

Hi All, Hope all you are doing good! Need your help. I have an XML file which needs to be converted CSV file. I am not an expert of awk/sed so your help is highly appreciated!! XML file looks like this: <l:event dateTime="2013-03-13 07:15:54.713" layerName="OSB" processName="ABC"... (2 Replies)
Discussion started by: bhaskar_m
2 Replies

9. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

10. Shell Programming and Scripting

Pass some data from csv to xml file using shell/python

Hello gurus, I have a csv file with bunch of datas in each column. (see attached) Now I have an .xml file in the structure of below: ?xml version="1.0" ?> <component id="root" name="root"> <component id="system" name="system"> <param name="number_of_A" value="8"/> ... (5 Replies)
Discussion started by: Zam_1234
5 Replies
XMLTO(1)							     Reference								  XMLTO(1)

NAME
xmlto - apply an XSL stylesheet to an XML document SYNOPSIS
xmlto [-o output_dir] [-x custom_xsl] [-m xsl_fragment] [-v] [-p postprocessor_opts] [--extensions] [--searchpath path] [--skip-validation] [--stringparam paramname=paramvalue] [--noclean] [--noautosize] [--noextensions] [--with-fop] [--with-dblatex] {format} {file} xmlto {[--help] | [--version]} DESCRIPTION
The purpose of xmlto is to convert an XML file to the desired format using whatever means necessary. This may involve two steps: 1. The application of an appropriate XSL stylesheet using an XSL-T processor. 2. Further processing with other tools. This step may not be necessary. To decide which stylesheet to use and what, if any, needs to be done to post-process the output, xmlto makes use of format scripts, which are simple shell scripts that xmlto calls during the conversion. The appropriate format script is selected based on the type of XML file and the desired output format. xmlto comes with some format scripts for converting DocBook XML files to a variety of formats. You may specify your own format script by using an absolute filename for format on the command line. Firstly, if xmlto has not been told explicitly which stylesheet to use (with the -x option), the format script will be called with $1 set to stylesheet. The environment variable XSLT_PROCESSOR contains the base name of the executable that will be used to perform the XSL-T transformation (for example xsltproc). The format script should write the name of the stylesheet to use to standard output and exit successfully, or exit with a non-zero return code if there is no appropriate stylesheet to use (for example, if the only available stylesheet is known not to work with the XSL-T processor that will be used). If nothing is written to standard output but the script exits successfully, no XSL-T transformation will be performed. Secondly, after an XSL-T processor has been run using the stylesheet, the format script will be called again, this time with $1 set to post-process. The format script should perform any necessary steps to translate the XSL-T processed output into the desired output format, including copying the output to the desired output directory. For post-processing, the format script is run in a temporary directory containing just the processed output (whose name is stored in XSLT_PROCESSED and whose basename is that of the original XML file with any filename extension replaced with .proc). INPUT_FILE is set to the name of the original XML file, OUTPUT_DIR is set to the name of the directory that the output (and only the output) must end up in, and SEARCHPATH is set to a colon-separate list of fallback directories in which to look for input (for images, for example). If this step is unsuccessful the format script should exit with a non-zero return code. OPTIONS
-v Be verbose (-vv for very verbose). -x stylesheet Use stylesheet instead of asking the format script to choose one. -m fragment Use the provided XSL fragment to modify the stylesheet. -o directory Put output in the specified directory instead of the current working directory. -p postprocessor_opts Pass postprocessor_opts to processing stages after stylesheet application (e.g. lynx or links when going through HTML to text, or xmltex when going from through TeX to DVI). If -p is specified a second time, the options specified will be passed to second-stage postprocessing; presently this is only applicable when going through xmltex and dvips to PostScript. --extensions Turn on stylesheet extensions for the tool chain in use (use.extensions is turned on). The variables turned on are the ones used by Norman Walsh's DocBook XSL stylesheets. --searchpath path Add the colon-separated list of directories in path as fallback directories for including input. --skip-validation Skip the validation step that is normally performed. --stringparam paramname=paramvalue Pass a named parameter paramname with value paramvalue to stylesheet from the command line. --noclean Temporary files are not deleted(their names are shown and kept in tmp directory). It could help with analyzing problems. --noautosize By default, some XSL variables are overriden by autodetection (page.width and page.height for paperconf (libpaper) use, paper.type for locale-based (LC_PAPER) selection). With this option, xmlto doesn't use this autodetection and user is able to modify defaults himself (either via default param.xsl modification or by user-defined XSL fragment). --noextensions By default, xmlto enables XSL params passivetex.extensions for passivetex backend and fop.extensions and fop1.extensions for fop backend. This usually produces better results. If you for some reason don't want to use these parameters, just disable them using this option. --with-fop Use fop for formatting. It is an experimental option, expects fop in specific location(detected at configured time), could be changed manually in xmlto script by modification of FOP_PATH --with-dblatex Use dblatex for formatting. It is an experimental option, expects dblatex in specific location(detected at configured time), could be changed manually in xmlto script by modification of DBLATEX_PATH --help Display a short usage message. It will describe xmlto's options, and the available output formats. --version Display the version number of xmlto. ENVIRONMENT
XSLT_PROCESSOR Base name of the executable that will be used to perform the XSL-T transformation (default: xsltproc(1)). TMPDIR Directory, where to create temporary stylesheets (default: /tmp). DIAGNOSTICS
0 Everything went fine. This is the expected exit code. 1 xmlto was called with insufficient arguments. 2 mktemp(1) failed to create a file/directory. Make sure /tmp or TMPDIR is writable. 3 xmlto failed to find some binary on configured location. Make sure that all required packages are installed and paths in xmlto script are set properly. 10+(Validation non-zero error code) xmlto tried to validate a xml document, but validation failed. For better diagnostic, validation output and xmllint exit code is provided. Consider either fixing your document or using --skip-validation. EXAMPLES
To convert a DocBook XML document to PDF, use: xmlto pdf mydoc.xml To convert a DocBook XML document to HTML and store the resulting HTML files in a separate directory use: xmlto -o html-dir html mydoc.xml To convert a DocBook XML document to a single HTML file use: xmlto html-nochunks mydoc.xml To modify the output using an XSL fragment use: xmlto -m ulink.xsl pdf mydoc.xml To specify which stylesheet to use (overriding the one that the format script would choose) use: xmlto -x mystylesheet.xsl pdf mydoc.xml AUTHORS
Tim Waugh <twaugh@redhat.com> Original author, maintainer until 0.0.18 Ondej Vaik <ovasik@redhat.com> Maintainer since 0.0.19 COPYRIGHT
xmlto 0.0.25 November 2011 XMLTO(1)
All times are GMT -4. The time now is 12:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy