Converting XML to CSV


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Converting XML to CSV
# 1  
Old 09-10-2019
Converting XML to CSV

Hello,


For i while i have been using XMLStarlet to convert several XML files to CSV files. So far this always went fine.


Today however i got a new XML format however but i cannot find out how to get the data i need.


Below is part of the code where it shows the different format. What i expect is the following:


Code:
idpurchaseorder;idsupplier;purchaseorderid;idproduct;productcode;price;amount;amountreceived

Code:
<searchresults>
  <success>true</success>
  <rate-limit-remaining>498</rate-limit-remaining>
  <data>
    <idpurchaseorder>373251</idpurchaseorder>
    <idsupplier>42919</idsupplier>
    <idwarehouse>4863</idwarehouse>
    <idtemplate>3994</idtemplate>
    <purchaseorderid>PO2019-1008</purchaseorderid>
    <supplier_orderid>09-09-2019</supplier_orderid>
    <supplier_name></supplier_name>
    <status>purchased</status>
    <remarks></remarks>
    <delivery_date>2019-09-14</delivery_date>
    <language>nl</language>
    <purchased_by_iduser>6407</purchased_by_iduser>
    <purchased_at>2019-09-10 09:31:37</purchased_at>
    <completed_by_iduser></completed_by_iduser>
    <completed_at></completed_at>
    <created_by_iduser>6414</created_by_iduser>
    <created>2019-09-09 12:41:44</created>
    <updated>2019-09-10 09:31:37</updated>
    <products>
      <idpurchaseorder_product>5275880</idpurchaseorder_product>
      <idproduct>10307107</idproduct>
      <idvatgroup>9520</idvatgroup>
      <productcode>P00019664</productcode>
      <productcode_supplier>0389450893456</productcode_supplier>
      <name>Unold 78856</name>
      <price>99</price>
      <amount>1</amount>
      <amountreceived>0</amountreceived>
      <delivery_date></delivery_date>
      <weight>0</weight>
    </products>
  </data>
  <data>
    <idpurchaseorder>373062</idpurchaseorder>
    <idsupplier>42561</idsupplier>
    <idwarehouse>4863</idwarehouse>
    <idtemplate>3994</idtemplate>
    <purchaseorderid>PO2019-1007</purchaseorderid>
    <supplier_orderid>09-09-2019</supplier_orderid>
    <supplier_name></supplier_name>
    <status>purchased</status>
    <remarks></remarks>
    <delivery_date>2019-09-13</delivery_date>
    <language>en</language>
    <purchased_by_iduser>6414</purchased_by_iduser>
    <purchased_at>2019-09-09 10:46:22</purchased_at>
    <completed_by_iduser></completed_by_iduser>
    <completed_at></completed_at>
    <created_by_iduser>6414</created_by_iduser>
    <created>2019-09-09 10:46:18</created>
    <updated>2019-09-09 10:46:22</updated>
    <products>
      <idpurchaseorder_product>5272744</idpurchaseorder_product>
      <idproduct>10304708</idproduct>
      <idvatgroup>9520</idvatgroup>
      <productcode>P00172393</productcode>
      <productcode_supplier>34556WDR09</productcode_supplier>
      <name>Product 2</name>
      <price>999</price>
      <amount>1</amount>
      <amountreceived>0</amountreceived>
      <delivery_date></delivery_date>
      <weight>0</weight>
    </products>
    <products>
      <idpurchaseorder_product>5272745</idpurchaseorder_product>
      <idproduct>10304135</idproduct>
      <idvatgroup>9520</idvatgroup>
      <productcode>P00020201</productcode>
      <productcode_supplier>345345074</productcode_supplier>
      <name>Productname</name>
      <price>99</price>
      <amount>1</amount>
      <amountreceived>0</amountreceived>
      <delivery_date></delivery_date>
      <weight>0</weight>
    </products>
  </data>
 </searchresults>

What i tried are the following with the results of (part of all the needed columns to keep it easier to read) the above piece.


Code:
xmlstarlet sel -t -m //searchresults/data -v "concat(idpurchaseorder,';',products/idproduct)" -n purchaseorders.xml

this above gave me:
Code:
373251;10307107
373062;10304708 <-- here should be a second line with the second column different id

then i tried the following line:


Code:
xmlstarlet sel -t -m //searchresults/data -v idpurchaseorder -o " ; " -v products/idproduct -n purchaseorders.xml

and this gave me the output below

Code:
373251 ; 10307107
373062 ; 10304708
10304135 <-- here is the idpurchaseorder missing from the first column but it does have both products listed

What i expect would be the following result from the testdata (i can add the other needed data myself if needed) above:

Code:
373251;10307107
373062;10304708
373062;10304135

Does anyone know the correct way of getting the data requested? If it is a other way then XMLStarlet it is fine although that is the easiest so far for me.

Last edited by SDohmen; 09-10-2019 at 03:33 PM..
These 2 Users Gave Thanks to SDohmen For This Post:
# 2  
Old 09-10-2019
Hi SDohmen,

some additional bits of Knowledge about XPath will lead you forward towards your solution. Especially try to get information about how to figure out "parent nodes" of your target nodes.

One possible algorithm is to not iterate over the different orders(data), but iterate over the different products(products) and get a parent node(idpurchaseorder) from there.

Regards,
stomp.

P. S.: @Mod or @SDohmen: please fix the typo in the xml above: <name>Productname/name>

P. P. S.: I did not know the solution before the question was asked, and - according to Neos request - I will not take the fun and chance to grow from the poster to figuring it out for him-/herself. (The solution is about the same in size as the current shown attempts).

P. P. P. S: +1 Point for SDohmen using a suitable tool for the task. ;-)

Last edited by stomp; 09-10-2019 at 03:48 PM..
These 2 Users Gave Thanks to stomp For This Post:
# 3  
Old 09-10-2019
Quote:
Originally Posted by stomp
Hi SDohmen,

some additional bits of Knowledge about XPath will lead you forward towards your solution. Especially try to get information about how to figure out "parent nodes" of your target nodes.

One possible algorithm is to not iterate over the different orders(idpurchaseorder), but iterate over the different products(idproduct) and get the parent node(idpurchaseorder) from there.

Regards,
stomp.

P. S.: @Mod or @SDohmen: please fix the typo in the xml above: <name>Productname/name>

P. P. S.: I did not know the solution before the question was asked, and - according to Neos request - I will not take the fun and potential for growth from the poster to figuring it out for his-/herself. (The solution is about the same in size as the current shown attempts).

P. P. P. S: +1 Point for SDohmen using the right tool for the right task. ;-)

Thank you for the comment. I edited the text to have the xml correctly displayed.


I think i know what you mean by selecting the subnodes first but i have no clue how to get the rest filled in.



When i change the line to the one below:
Code:
xmlstarlet sel -t -m "/searchresults/data" -v "products/idproduct" -o " ; " -v idpurchaseorder -n purchaseorders.xml

i am getting the text as above except switched around so that was not what you mean i guess.


Do you have any more clues what to look for?




i did search around some more and found the link below:
Code:
https://stackoverflow.com/questions/53550425/extracting-xml-child-node-values-based-on-a-parent

This looks a bit like it but i am not sure if that is the correct direction.
# 4  
Old 09-10-2019
What did I do to find the solution?
  • Read some Tutorials about XPath
  • Reached out for xmlstarlet documentation
  • Used google a lot on the subject, learned from many examples(stackoverflow delivers many of those)
  • experimented with what I found, began sometimes with most simple things to understand small bits and rose higher to the full complexity of whole task

Last edited by stomp; 09-11-2019 at 03:16 AM..
This User Gave Thanks to stomp For This Post:
# 5  
Old 09-12-2019
A little help for what you've nearly got yourself:

Code:
xmlstarlet sel -t -m //searchresults/data/products -v "concat(';',idproduct)" -n purchaseorders.xml

This will give you the ordered products, like this:

Code:
;10307107
;10304708
;10304135

Now you just have to fill in the XPath-Expression for the Parent node - the ordernumber - into the concat statement...

I'm reading a bit about XPath now. It's really interesting, what is possible with that:

XPath Syntax

XPath Axes

Last edited by stomp; 09-12-2019 at 06:19 PM..
This User Gave Thanks to stomp For This Post:
# 6  
Old 09-12-2019
SDohmen, I don't generally thank people for posting questions.

However you've posted good example data and a clear and concise question with a reasonable attempt at solving it yourself. Also the tool your using is the right one for the job, so many XML parsing questions posted here want to use sed / grep / awk which will usually fail given a different XML layout.

Interesting to note that if you start from the parent level, which would have been my first inclination, you only get the first product under that parent:

Code:
xmlstarlet sel -t -m //searchresults/data -v "concat(';',products/idproduct)" -n purchaseorders.xml
;10307107
;10304708

This User Gave Thanks to Chubler_XL For This Post:
# 7  
Old 09-16-2019
Sorry for the late reply. It took a while and help from some friends who have more knowledge also about this subject. He told me the same thing and gave me a short example of what is possible. With that example i was able to product the line below:


Code:
xmlstarlet sel -T -t -m "/searchresults/data/products" -v ../idpurchaseorder -o ";" -v ../idsupplier -o ";" -v ../purchaseorderid -o ";" -v ../purchased_at -o ";" -v idproduct -o ";" -v productcode -o ";" -v price -o ";" -v amount -o ";" -v amountreceived -n purchaseorders.xml > purchaseorders.csv

I was not aware that you could "backtrack" the tree and get the main info added to the product info. It most likely isn't the cleanest solution but at least it works great.




Quote:
Originally Posted by Chubler_XL
SDohmen, I don't generally thank people for posting questions.

However you've posted good example data and a clear and concise question with a reasonable attempt at solving it yourself. Also the tool your using is the right one for the job, so many XML parsing questions posted here want to use sed / grep / awk which will usually fail given a different XML layout.

Interesting to note that if you start from the parent level, which would have been my first inclination, you only get the first product under that parent:

Code:
xmlstarlet sel -t -m //searchresults/data -v "concat(';',products/idproduct)" -n purchaseorders.xml
;10307107
;10304708


Thank you for this. I always try my best to get as much as possible done myself. Only if i am really stuck i will post the question and even that is most likely days after i got stuck. I find it important to try and test myself although sometimes that is not possible. For each problem it is of course searching for the best solution and even though i dont fully understand xmlstarlet it gave me the best results for all the scripts i have used it on. I seen several attempts with sed etc but they where so confusing or gave me weird outputs (Its like opening a door with a hammer instead of the key Smilie) that i wanted to use a program that is specifically designed for this and that was/is xmlstarlet.


I do have to add though that i cheated a bit as i got json data before the xml and i transformed it to the xml to actually get the proper data for it. Since i do lack the knowledge to get all json data (although i did try it with jq first), i decided to look for the transformation (Decoding JSON to a PHP Array and then converting it to XML. . GitHub) and then get all data with xmlstarlet. This last part was almost correct then except for the backtracking that i did not know.

Last edited by SDohmen; 09-16-2019 at 08:04 AM..
This User Gave Thanks to SDohmen For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk for converting xml to txt

Hi, I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes My xml script (Workorders.xml) is shown like below: <?xml version="1.0" encoding="utf-8" ?> <scbm-extract version="3.3">... (8 Replies)
Discussion started by: Viswanatheee55
8 Replies

2. UNIX for Dummies Questions & Answers

Need help converting txt to XML

I have a table as following Archive id Line Author Time Text 1fjj34 3 75jk5l 03:20 this is an evidence regarding ... 1fjj34 4 gjhhtrd 03:21 we have seen those documents before 1fjj34 10 645jmdvvb 04:00 Will you consider such an offer?... (0 Replies)
Discussion started by: A-V
0 Replies

3. Shell Programming and Scripting

Help with converting XML to Flat file

Hi Friends, I want to convert a XML file to flat file. Sample I/p: <?xml version='1.0' encoding='UTF-8' ?> <DataFile xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' contactCount='4999' date='2012-04-14' time='22:00:14' xsi:noNamespaceSchemaLocation='gen .xsd'> <Contact... (3 Replies)
Discussion started by: karumudi7
3 Replies

4. Shell Programming and Scripting

Converting CSV to ascii format

Hej All, I have a file like this which is a comma dilimited: input: 6000318,-263.011520678,-59.05869872409,587.67924868792 6000319,-265.6996842902,-50.24902479999,590.65693082607 6000320,-238.1333898366,-288.801232595,633.75332173496... (5 Replies)
Discussion started by: Johanni
5 Replies

5. Shell Programming and Scripting

converting specific XML file to CSV

Hi, i would convert the following XML file : <?xml version="1.0" encoding="UTF-8" ?> <files xmlns="http://www.lotus.com/dxl/console"> <filedata notesversion="6" odsversion="43" logged="yes" backup="no" id="C12577E6:004B0DA3" iid="C12577E6:004B0DA8" link="1" dboptions="0,524288,0,0"> ... (24 Replies)
Discussion started by: Nicol
24 Replies

6. Shell Programming and Scripting

Converting a flat file in XML

Hello Friends, I am new to UNIX shell scripting. Using bash....Could you please help me in converting a flat file into an XML style output file. Flat file: (Input File entries looks like this) John Miller: 617-569-7996:15 Bunting lane, staten Island, NY: 10/21/79: 60600 The... (4 Replies)
Discussion started by: humkhn
4 Replies

7. UNIX for Dummies Questions & Answers

Converting spaces to csv

I'm new to shell scripting and I have a file with positional columns. The rows look like this: 222 3333 44444 55 55 55 55 5555 59999 222 3333 44444 55 55 55 55 5555 59999 Not every row has the same number of spaces between the seven columns. They can range anywhere... (11 Replies)
Discussion started by: jkandel
11 Replies

8. UNIX for Dummies Questions & Answers

Converting HTML to CSV

Hi, I need to convert a relatively large html file (1.5megs) into CSV under Unix. How would I be able to do this? Much thanks. (3 Replies)
Discussion started by: Jexel
3 Replies

9. Shell Programming and Scripting

converting text to csv format

I am trying to check each line and based on first two digits, the comma needs to be place. I checked in the earlier post where the text is converted to csv with a tab delimited. Here is the test file that needs to be changed to csv 11 051701 22 051701 330123405170105170112345... (13 Replies)
Discussion started by: gthokala
13 Replies

10. Shell Programming and Scripting

Converting csv to xls

Hi, Can anyone tell the option to change the file type in unix. i.e. if a file is in csv(Comma Separating Values) format, it should be changed to xls(ordinary MS-Excel) format. But renaming command is not changing to correct file format. Thanks in advance, Milton. (1 Reply)
Discussion started by: miltony
1 Replies
Login or Register to Ask a Question