Xml to csv

 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Xml to csv
# 1  
Old 03-02-2017
Xml to csv

Hello,
Does anyone know of a way to convert an .xml file (ONIX) to something more workable, like a .csv (or even .xls) file? Ideally something on the command line would be ideal, but not absolutely necessary. I would be dealing with .xml files of 125 MB+.

I am using XQuartz in El Capitan.

Thanks very much!
# 2  
Old 03-03-2017
Hi.

You didn't supply sample input and desired output, so I couldn't attempt a relevant demonstration.

Possible utilities:
Code:
XML2(1)                     General Commands Manual                    XML2(1)

NAME
       xml2 - convert xml documents in a flat format

       2xml - convert flat format into xml

       html2 - convert html documents in a flat format

       2html - convert flat format into html

       csv2 - convert csv files in a flat format

       2csv - convert flat format into csv

On a system:
Code:
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie)

Some details for xml2 package:
Code:
xml2    convert xml documents in a flat format (man)
Path    : /usr/bin/xml2
Version : - ( /usr/bin/xml2, 2012-04-16 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.7 (jessie)

Versions appear to be available via brew, fink, port for a system like:
Code:
OS, ker|rel, machine: Apple/BSD, Darwin 9.8.0, Power Macintosh
Distribution        : Mac OS X 10.5.8 (leopard, workstation)

Best wishes ... cheers, drl
# 3  
Old 03-03-2017
Show the input you have and show the output you want. "Generic" conversion isn't really possible given XML is a tree structure, not a flat structure, but your particular data file may have regular data representable as such.
# 4  
Old 03-11-2017
Thanks for the replies. I have copied .xml code for a single item below. I am trying to extract three items (field indices a001, b203, and j151), so the desired output would be:

Code:
9781328740472  Peepers  7.99

Thanks again!

Code:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ONIXmessage SYSTEM "http://www.editeur.org/onix/2.1/short/onix-international.dtd" >
<ONIXmessage release="2.1">
<header><m174>Houghton Mifflin</m174><m175>Catherine Toolan 978-465-7755</m175><m283>eloquence@firebrandtech.com</m283><m182>20170201</m182><m183>Title information from Houghton Mifflin</m183><m184>eng</m184><m185>01</m185><m186>USD</m186><m187>in</m187><m193>General Trade</m193></header>
  <product>
    <a001>9781328740472</a001>
    <a002>02</a002>
    <a197>HMH</a197>
    <productidentifier>
      <b221>02</b221>
      <b244>1328740471</b244>
    </productidentifier>
    <productidentifier>
      <b221>03</b221>
      <b244>9781328740472</b244>
    </productidentifier>
    <productidentifier>
      <b221>15</b221>
      <b244>9781328740472</b244>
    </productidentifier>
    <b246>11</b246>
    <b012>BC</b012>
    <b333>B102</b333>
    <b014>Trade Paperback</b014>
    <n338/>
    <title>
      <b202>01</b202>
      <b203>Peepers</b203>
    </title>
    <workidentifier>
      <b201>15</b201>
      <b244>9780152602970</b244>
    </workidentifier>
    <contributor>
      <b034>1</b034>
      <b035>A01</b035>
      <b036>Eve Bunting</b036>
      <b037>Bunting, Eve</b037>
      <b039>Eve</b039>
      <b040>Bunting</b040>
      <b044><![CDATA[<DIV><P>EVE BUNTING has written*over two hundred*books for children, including the Caldecott Medal-winning <I>Smoky Night,</I> illustrated by David Diaz, <I>The Wall</I>,<I> Fly Away Home</I>, and <I>Train to Somewhere</I>. She lives in Southern California.</P></DIV>]]></b044>
    </contributor>
    <contributor>
      <b034>2</b034>
      <b035>A12</b035>
      <b036>James  E. Ransome</b036>
      <b037>Ransome, James  E.</b037>
      <b039>James  E.</b039>
      <b040>Ransome</b040>
      <b044><![CDATA[<DIV><P>James Ransome has illustrated more than 35 books for children, including many award winners. He lives in Rhinebeck, New York, with his wife, children's book author*Lesa Cline Ransome, and their four children. Visit his website at <A href="http://www.jamesransome.com/">www.jamesransome.com</A>.</DIV>]]></b044>
    </contributor>
    <b049>Eve Bunting, illustrated by James Ransome</b049>
    <n386/>
    <language>
      <b253>01</b253>
      <b252>eng</b252>
    </language>
    <b061>32</b061>
    <b062><![CDATA[full-color illustrations]]></b062>
    <b064>JUV029000</b064>
    <subject>
      <b067>10</b067>
      <b069>JUV013000</b069>
    </subject>
    <subject>
      <b067>20</b067>
      <b070>fall;autumn;New England;brothers;leaves;color tour;leaf peepers;graveyard;trees;pumpkins;halloween;tour;bus;river;picture book</b070>
    </subject>
    <subject>
      <b067>22</b067>
      <b069>EV065</b069>
    </subject>
    <subject>
      <b067>22</b067>
      <b069>HL070</b069>
    </subject>
    <audience>
      <b204>01</b204>
      <b206>02</b206>
    </audience>
    <audiencerange>
      <b074>11</b074>
      <b075>03</b075>
      <b076>P</b076>
      <b075>04</b075>
      <b076>3</b076>
    </audiencerange>
    <audiencerange>
      <b074>17</b074>
      <b075>03</b075>
      <b076>4</b076>
      <b075>04</b075>
      <b076>7</b076>
    </audiencerange>
    <othertext>
      <d102>01</d102>
      <d103>02</d103>
      <d104><![CDATA[<div>It's fall again, and time for Jim and Andy to help their dad run Fred's Fall Color Tours. The tourists they shuttle around are "Leaf Peepers"--and, boy, do those Peepers love to ooh and aah about the dumbest things. Leaves, trees, pumpkins. <i> Bo-o-ring.</i><br><i>	</i>But this yerar, even as they poke fun at the Peepers, Jim and Andy can't help but notice how the leaves floating in the river look like a brilliantly colored island, and how the spiky tree branches seem to sweep the clouds across the night sky.<br>	Maybe the Peepers aren't so silly after all.<br></div>]]></d104>
    </othertext>
    <othertext>
      <d102>02</d102>
        <d103>02</d103>
        <d104><![CDATA[<DIV>It's fall again, and time for Jim and Andy to help their dad run Fred's Fall Color Tours. The tourists they shuttle around are &quot;Leaf Peepers&quot;--and, boy, do those Peepers love to ooh and aah about the dumbest things. Leaves, trees, pumpkins. <I> Bo-o-ring.</I><BR /> But this yerar, even as they poke fun at the Peepers, Jim and Andy can't help but notice how the leaves floating in the river look like a brilliantly colored island, and how the spiky tree branches seem to sweep the clouds across the night sky.<BR /> Maybe the Peepers aren't so silly after all.</DIV>]]></d104>
    </othertext>
    <othertext>
      <d102>13</d102>
      <d103>02</d103>
      <d104><![CDATA[<div><b>EVE BUNTING</b> is the author of many acclaimed books for young readers, including the Caldecott Medal–winning <i>Smoky Night. </i>Her numerous honors include the prestigious Kerlan Award for her body of work. Ms. Bunting lives in Southern California.<br><br><b>JAMES RANSOME</b> has illustrated many books for children. He received the Coretta Scott King Illustrator Award for <i>The Creation</i> and a Coretta Scott King Illustrator Honor for <i>Uncle Jed’s Barbershop. </i>He lives in Poughkeepsie, New York. <br></div>]]></d104>
    </othertext>
    <mediafile>
      <f114>04</f114>
      <f115>03</f115>
      <f116>01</f116>
      <f117>http://cloud.firebrandtech.com/api/v2/hostedcover/eb4f776c-004b-4ac5-97bd-a6de017b03a9</f117>
    </mediafile>
    <imprint>
      <b241>01</b241>
      <b242>HMH Books for Young Readers</b242>
      <b243>66201921</b243>
      <b079>HMH Books for Young Readers</b079>
    </imprint>
    <publisher>
      <b291>01</b291>
      <b241>01</b241>
      <b242>HMH Books for Young Readers</b242>
      <b243>66201921</b243>
      <b081>Houghton Mifflin Harcourt</b081>
    </publisher>
    <b394>02</b394>
    <b003>20170905</b003>
    <b087>2001</b087>
    <salesrights>
      <b089>01</b089>
      <b090>AD AE AF AG AI AL AM AO AQ AR AS AT AU AW AZ BA BB BD BE BF BG BH BI BJ BL BM BN BO BR BS BT BV BW BY BZ CA CC CD CF CG CH CI CK CL CM CN CO CR CU CV CX CY CZ DE DJ DK DM DO DZ EC EE EG EH ER ES ET FI FJ FK FM FO FR GA GB GD GE GF GG GH GI GL GM GN GP GQ GR GS GT GU GW GY HK HM HN HR HT HU ID IE IL IM IN IO IQ IR IS IT JE JM JO JP KE KG KH KI KM KN KP KR KW KY KZ LA LB LC LI LK LR LS LT LU LV LY MA MC MD ME MF MG MH MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ NA NC NE NF NG NI NL NO NP NR NU NZ OM PA PE PF PG PH PK PL PM PN PR PT PW PY QA RE RO RS RU RW SA SB SC SD SE SG SH SI SJ SK SL SM SN SO SR SS ST SV SY SZ TC TD TF TG TH TJ TK TM TN TO TR TT TV TW TZ UA UG UM US UY UZ VA VC VE VG VI VN VU WF WS YE YT ZA ZM ZW</b090>
    </salesrights>
    <measure>
      <c093>01</c093>
      <c094>11</c094>
      <c095>in</c095>
    </measure>
    <measure>
      <c093>01</c093>
      <c094>279.4</c094>
      <c095>mm</c095>
    </measure>
    <measure>
      <c093>02</c093>
      <c094>8.5</c094>
      <c095>in</c095>
    </measure>
    <measure>
      <c093>02</c093>
      <c094>215.9</c094>
      <c095>mm</c095>
    </measure>
    <measure>
      <c093>08</c093>
      <c094>1</c094>
      <c095>lb</c095>
    </measure>
    <measure>
      <c093>08</c093>
      <c094>16</c094>
      <c095>oz</c095>
    </measure>
    <measure>
      <c093>08</c093>
      <c094>453.59</c094>
      <c095>gr</c095>
    </measure>
    <relatedproduct>
      <h208>23</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780062086303</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>23</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780544339200</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780544808997</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780544555471</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>23</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9781442476561</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780544227330</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780152602970</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780395742129</b244>
      </productidentifier>
    </relatedproduct>
    <relatedproduct>
      <h208>22</h208>
      <productidentifier>
        <b221>15</b221>
        <b244>9780395764787</b244>
      </productidentifier>
    </relatedproduct>
    <supplydetail>
      <j137>Houghton Mifflin Company</j137>
      <j141>NP</j141>
      <j396>10</j396>
      <j142>20170809</j142>
      <j143>20170905</j143>
      <j145>50</j145>
      <price>
        <j148>01</j148>
        <discountcoded>
          <j363>02</j363>
          <j364>88 - Trade &amp; Ref Child PA</j364>
        </discountcoded>
        <j151>7.99</j151>
        <j152>USD</j152>
        <j161>20160726</j161>
      </price>
      <price>
        <j148>01</j148>
        <discountcoded>
          <j363>02</j363>
          <j364>88 - Trade &amp; Ref Child PA</j364>
        </discountcoded>
        <j151>10.99</j151>
        <j152>CAD</j152>
        <j161>20161216</j161>
      </price>
    </supplydetail>
    <k167>15000</k167>
  </product>

# 5  
Old 03-11-2017
Hi.

Here is a quickly-put-together solution using xml2 as the fundamental operation, followed by steps that could be put together in a simple step:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate string extraction from XML file, xml2.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C specimen xml2 grep awk tr dixf

FILE=${1-data1}
E=expected-output.txt

pl " Sampled lines from data file $FILE:"
specimen -n $FILE

pl " Expected output (augmented):"
cat $E

# Look for a001, b203, and j151
pl " Results, warning message expected:"
xml2 < $FILE |
tee f1 |
grep -E '(a001|b203|j151)=' |
tee f2 |
awk -F/ '{print $NF}'|
tee f3 |
awk -F= '{print $2}'|
tee f4 |
( tr '\n' '\t' ; echo "" ) | 
tee f5

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C f5 || ( pe; pe " Results cannot be verified." ) >&2

pl " Details for xml2:"
dixf xml2

rm -f f?
exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
specimen (local) 1.17
xml2 - ( /usr/bin/xml2, 2012-04-16 )
grep (GNU grep) 2.20
awk GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)
tr (GNU coreutils) 8.23
dixf (local) 1.42

-----
 Sampled lines from data file data1:
Edges: 5:0:5 of 263 lines in file "data1"
     1  <?xml version="1.0" encoding="utf-8"?>
     2  <!DOCTYPE ONIXmessage SYSTEM "http://www.editeur.org/onix/2.1/short/onix-international.dtd" >
     3  <ONIXmessage release="2.1">
     4  <header><m174>Houghton Mifflin</m174><m175>Catherine Toolan 978-465-7755</m175><m283>eloquence@firebrandtech.com</m283><m182>20170201</m182><m183>Title information from Houghton Mifflin</m183><m184>eng</m184><m185>01</m185><m186>USD</m186><m187>in</m187><m193>General Trade</m193></header>
     5    <product>
   ---
   259          <j161>20161216</j161>
   260        </price>
   261      </supplydetail>
   262      <k167>15000</k167>
   263    </product>

-----
 Expected output (augmented):
9781328740472  Peepers  7.99  10.99

-----
 Results, warning message expected:
error: Extra content at the end of the document
9781328740472   Peepers 7.99    10.99

-----
 Verify results if possible:

-----
 Comparison of 1 created lines with 1 lines of desired results:
f5 expected-output.txt differ: char 14, line 1
 Failed -- files f5 and expected-output.txt not identical -- detailed comparison follows.
 Succeeded by ignoring whitespace differences.

-----
 Details for xml2:
xml2    convert xml documents in a flat format (man)
Path    : /usr/bin/xml2
Version : - ( /usr/bin/xml2, 2012-04-16 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.7 (jessie)

Best wishes ... cheers, drl
# 6  
Old 03-13-2017
Hello,
I want to thank you so much for taking the time to do this. After replacing 1-data1 with the xml filename, I receive the following:

Code:
-----
 Sampled lines from data file :
./z: line 19: specimen: command not found

-----
 Expected output (augmented):
cat: expected-output.txt: No such file or directory

-----
 Results, warning message expected:
./z: line 26: $FILE: ambiguous redirect


-----
 Verify results if possible:

 Results cannot be verified.

-----
 Details for xml2:
./z: line 42: dixf: command not found

I am using XQuartz 2.7.9 on a Macbook Pro running El Capitan. Thanks again!
# 7  
Old 03-13-2017
Hi,

I'm going to be the Devil's Advocate here and suggest something entirely different. If this is a one-time-only conversion you have to do, or if it's something you won't have to do on a regular basis, I'd honestly import the XML into a spreadsheet like MS Excel or OpenOffice/LibreOffice Calc, and then look at tidying it up and exporting it out as a CSV from there.

Of course if this is going to be an ongoing thing you anticipate needing to do many times per day forever then some kind of script would be desirable, but if it's not going to be something you have to spend lots of time doing then you may actually save more time using a spreadsheet than trying to write a script for this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converting XML to CSV

Hello, For i while i have been using XMLStarlet to convert several XML files to CSV files. So far this always went fine. Today however i got a new XML format however but i cannot find out how to get the data i need. Below is part of the code where it shows the different format. What... (10 Replies)
Discussion started by: SDohmen
10 Replies

2. UNIX for Beginners Questions & Answers

Xml to csv (again)

Hello, I have copied .xml code for a single item below. I am trying to extract three items (field indices*b244 (second occurrence), b203, and j151), so the desired output would be: 9780323013543 Manual of Natural Veterinary Medicine: Science and Tradition, 1e 68.95 A parallel solution,... (14 Replies)
Discussion started by: palex
14 Replies

3. Shell Programming and Scripting

XML to CSV

I want to pharse below Xml Using Shell Scripting . Thanks in Advance <md> <neid> <neun>1523</neun> <nedn>XXX1212</nedn> <nesw>fffff12515</nesw> </neid> <mi> <mts>20141128001500</mts> <gp>550</gp> <mt>pmct1</mt> <mt>pmNo2</mt> <mt>pmNo3S</mt> <mv> <moid>Ma=1,Rn=1,Ul=311C</moid>... (6 Replies)
Discussion started by: pareshkp
6 Replies

4. Shell Programming and Scripting

How to convert xml to csv ?

I am in need of converting billions of XML into csv file to load data to DB, i have found the below code in perl but not sure why it's not working properly. CODE: #!/usr/bin/perl # Script to illustrate how to parse a simple XML file # and pick out all the values for a specific element, in... (1 Reply)
Discussion started by: rspwilliam
1 Replies

5. Shell Programming and Scripting

Convert xml to csv

I need to convert below xml code to csv. I searched other posts as well but this post (_https://www.unix.com/shell-programming-scripting/174417-extract-parse-xml-data-statistic-value-csv.html) gives "sed command garbled" error. As of now I have written a long script to do it, but can it be done with... (7 Replies)
Discussion started by: dineshydv
7 Replies

6. Shell Programming and Scripting

XML to CSV specific

Hi , Please any one to help on ,extract this xml code into csv columns list. <SOURCEFIELD BUSINESSNAME ="" DATATYPE ="date" DESCRIPTION ="" FIELDNUMBER ="1" FIELDPROPERTY ="0" FIELDTYPE ="ELEMITEM" HIDDEN ="NO" KEYTYPE ="NOT A KEY" LENGTH ="19" LEVEL ="0" NAME ="BUSINESS_DATE"... (4 Replies)
Discussion started by: mohan705
4 Replies

7. Shell Programming and Scripting

XML to csv transformation

Hi, I want to write a perl script. Which should accept the xml file, one xsl file and the loaction. The perl script should process the xml file using the xsl file and puts the out put in specified location. For example: My.perl is perls cript. my.xml is like this <?xml version="1.0"... (2 Replies)
Discussion started by: siba.s.nayak
2 Replies

8. Shell Programming and Scripting

CSV processing to XML

Hi, i am really fresh with shell scripting and programming, i have an issue i am not able to solve to populate data on my server for Cisco IP phones. I have CSV file within the following format: ;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;... (9 Replies)
Discussion started by: angel2008
9 Replies

9. Shell Programming and Scripting

CSV to XML

Iam pretty new to UNIX and would like to convert a CSV to an XML file using AWK scripts. Can anybody suggest a solution? My CSV file looks something like this : Serial No Growth% Annual % Commission % Unemployed % 1 35% 29% 59% 42% 2 61% ... (15 Replies)
Discussion started by: pjanakir
15 Replies

10. Shell Programming and Scripting

Help to convert XML to CSV

Apologies if this has already been covered in this site somewhere, I did try looking but without any success. I am new to the whole XML thing, very late starter, and have a requirement to convert an XML fiule to a CSV fomat. I am crrently working on a Solaris OS. Does anyone have any suggestions,... (2 Replies)
Discussion started by: rossingi_33
2 Replies
Login or Register to Ask a Question