converting specific XML file to CSV


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting converting specific XML file to CSV
# 15  
Old 11-26-2010
Bug

I have awk and nawk but each time i have 2 lines blank

it seems that the data is not written

i continue to search

thanks
Christian
# 16  
Old 11-26-2010
Is your input file is proper? I mean does not have any special characters / tabs in the starting of each line?

If the file is from windows , do a dos2unix first and then use awk.
# 17  
Old 11-26-2010
Bug

Good idea

the file is extracted on the AIX machine
and i verify this is the same thing

thanks again , i will continue on next monday

regards
Christian
# 18  
Old 11-26-2010
Code:
egrep -ve '^[\s]*$|^[ \t]*<[^>=]*>$' toto | tail +3 | sed  's/<[^"]*="//;s/["]  [^"]*="/#/g;s/<[^#>]*>//g;s/[">]//g;s/$/#/;s/  *//g' | xargs  -n3 echo | sed 's/ //g'

Code:
egrep -ve '^[\s]*$|^[ \t]*<[^>=]*>$' toto | tail +3 | sed  'N;N;s/\n/#/g;s/[^<"]*="//g;s/<[^">]*>/#/g;s/[  <>]//g;s/["]/#/g;s/##*/#/g'

Code:
bash-3.00# cat toto
<?xml version="1.0" encoding="UTF-8" ?>

<files xmlns="http://www.lotus.com/dxl/console">

 <filedata notesversion="8" odsversion="51" logged="yes" >

  <cutoff interval="90">20100811T010253,56+02</cutoff>

  <path>/base/base01/mail/mail-20/valerie_deshuissard.nsf</path>

  </filedata>

 <filedata notesversion="8" odsversion="51" logged="yes" >

    <cutoff interval="90">20100811T010231,02+02</cutoff>

    <path>/base/base01/mail/mail-20/laurent_abello.nsf</path>

   </filedata>
bash-3.00# egrep -ve '^[\s]*$|^[ \t]*<[^>=]*>$' toto | tail +3 | sed 's/<[^"]*="//;s/["] [^"]*="/#/g;s/<[^#>]*>//g;s/[">]//g;s/$/#/;s/  *//g' | xargs -n3 echo | sed 's/ //g'
8#51#yes#9020100811T010253,56+02#/base/base01/mail/mail-20/valerie_deshuissard.nsf#
8#51#yes#9020100811T010231,02+02#/base/base01/mail/mail-20/laurent_abello.nsf#
bash-3.00#

Code:
bash-3.00# egrep -ve '^[\s]*$|^[ \t]*<[^>=]*>$' toto | tail +3 | sed 'N;N;s/\n/#/g;s/[^<"]*="//g;s/<[^">]*>/#/g;s/[ <>]//g;s/["]/#/g;s/##*/#/g'
8#51#yes#90#20100811T010253,56+02#/base/base01/mail/mail-20/valerie_deshuissard.nsf#
8#51#yes#90#20100811T010231,02+02#/base/base01/mail/mail-20/laurent_abello.nsf#
bash-3.00#


Last edited by ctsgnb; 11-26-2010 at 03:38 PM..
# 19  
Old 11-29-2010
Hi,

well it works with the file "toto" :

egrep -ve '^[\s]*$|^[ \t]*<[^>=]*>$' toto | tail +3 | sed 'N;N;s/\n/#/g;s/[^<"]*="//g;s/<[^">]*>/#/g;s/[ <>]//g;s/["]/#/g;s/##*/#/g'


my problem is that the paragraph could be with different longer

<filedata
................................
</filedata>

if my paragraph is :

<filedata notesversion="8" odsversion="51" logged="yes" backup="no" id="C125742C:0038C006" iid="7630E56A:ADB4562F" link="1" dboptions="8192,4849664,17276934,0">
<replica id="41256605:0048070F" flags="72" count="1">
<cutoff interval="90">20100811T010253,56+02</cutoff>
</replica>
<path>/base/base01/mail/mail-20/valerie_deshuissard.nsf</path>
<name>valerie_deshuissard.nsf</name>
<title>Valerie DESHUISSARD</title>
<template></template>
<inheritedtemplate>M0170DIT</inheritedtemplate>
<category>M5;w230;W230;F250;PDPI2</category>
<size current="129325927" max="0" usage="49429504"/>
<quota limit="0" warning="0"/>
<created>20080415T121951,74+02</created>
<lastcompact>20101119T182500,73+01</lastcompact>
<unread marks="yes" replicate="never"/>
<daos enabled="readwrite" objects="107" bytes="78994279" lastsync="20101126T151637,20+01"/>
</filedata>

the sort is made over many lines in place of only one and in another order like this :

90#20100811T010253,56+02#/base/base01/mail/mail-20/valerie_deshuissard.nsf#valerie_deshuissard.nsf#
#ValerieDESHUISSARD#M0170DIT#
#M5;w230;W230;F250;PDPI2#129325927#0#49429504#/#0#0#/
#20080415T121951,74+02#20101119T182500,73+01#yes#never#/

thanks
Christian
# 20  
Old 11-29-2010
egrep -ve '<filedata|</*replica|<daos' in | sed 's/^<cutoff interval="//;s:/>:>/:;s/="/>/g;s/"/</g;s/<[^>]*>/#/g' | grep -v '^#*$' | xargs -n3 echo | sed 's/[#]*[ ]*#/#/g' >output

Code:
# cat in
<filedata notesversion="8" odsversion="51" logged="yes" backup="no" id="C125742C:0038C006" iid="7630E56A:ADB4562F" link="1" dboptions="8192,4849664,17276934,0">
<replica id="41256605:0048070F" flags="72" count="1">
<cutoff interval="90">20100811T010253,56+02</cutoff>
</replica>
<path>/base/base01/mail/mail-20/valerie_deshuissard.nsf</path>
<name>valerie_deshuissard.nsf</name>
<title>Valerie DESHUISSARD</title>
<template></template>
<inheritedtemplate>M0170DIT</inheritedtemplate>
<category>M5;w230;W230;F250;PDPI2</category>
<size current="129325927" max="0" usage="49429504"/>
<quota limit="0" warning="0"/>
<created>20080415T121951,74+02</created>
<lastcompact>20101119T182500,73+01</lastcompact>
<unread marks="yes" replicate="never"/>
<daos enabled="readwrite" objects="107" bytes="78994279" lastsync="20101126T151637,20+01"/>
</filedata>
# egrep -ve '<filedata|</*replica|<daos' in | sed 's/^<cutoff interval="//;s:/>:>/:;s/="/>/g;s/"/</g;s/<[^>]*>/#/g' | grep -v '^#*$' | xargs -n3 echo | sed 's/[#]*[ ]*#/#/g'
90#20100811T010253,56+02#/base/base01/mail/mail-20/valerie_deshuissard.nsf#valerie_deshuissard.nsf#
#Valerie DESHUISSARD#M0170DIT#
#M5;w230;W230;F250;PDPI2#129325927#0#49429504#/#0#0#/
#20080415T121951,74+02#20101119T182500,73+01#yes#never#/
#

The output may appear truncated in more than 4 lines but it is not :

Code:
# egrep -ve '<filedata|</*replica|<daos' in | sed 's/^<cutoff interval="//;s:/>:>/:;s/="/>/g;s/"/</g;s/<[^>]*>/#/g' | grep -v '^#*$' | xargs -n3 echo | sed 's/[#]*[ ]*#/#/g' >output
# wc -l output
       4 output
# cat output
90#20100811T010253,56+02#/base/base01/mail/mail-20/valerie_deshuissard.nsf#valerie_deshuissard.nsf#
#Valerie DESHUISSARD#M0170DIT#
#M5;w230;W230;F250;PDPI2#129325927#0#49429504#/#0#0#/
#20080415T121951,74+02#20101119T182500,73+01#yes#never#/

---------- Post updated at 09:30 PM ---------- Previous update was at 09:13 PM ----------

If it doesn't fit your need, please provide a representative sample of your input as well as the expected output.

Last edited by ctsgnb; 11-29-2010 at 04:24 PM..
# 21  
Old 11-30-2010
Bug

Hi

thanks a lot for your effort

i uploaded the file "testxml.txt" for the input:

1 there's always a header that i don't need

<?xml version="1.0" encoding="UTF-8" ?>

<files xmlns="http://www.lotus.com/dxl/console">



2 the data are formatted as paragraph:
<filedata
................................
</filedata>

the content may be different longer

3 the data i need is 2 kinds :

<replica id="41256605:0048070F" flags="72" count="1">

data between double quotes as above

<path>/base/base01/mail/mail-20/valerie_deshuissard.nsf</path>

data directly between tag as above

and i need the ouput of a paragraph on only one line separated by #


the last command you sent gives the ouput on separated lines and search datas on tags that could be missing : daos , interval

regards

Christian
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converting XML to CSV

Hello, For i while i have been using XMLStarlet to convert several XML files to CSV files. So far this always went fine. Today however i got a new XML format however but i cannot find out how to get the data i need. Below is part of the code where it shows the different format. What... (10 Replies)
Discussion started by: SDohmen
10 Replies

2. UNIX for Beginners Questions & Answers

Data extraction and converting into .csv file.

Hi All, I have a data file and need to extract and convert it into csv format: 1) Read and extract the line containing string ending with "----" (file sample_linebyline.txt file) and to make a .csv file from this. 2) To read the flat file flatfile_sample.txt which consists of similar data (... (9 Replies)
Discussion started by: abhi_123
9 Replies

3. Shell Programming and Scripting

Converting rows to columns in csv file

Hi, I have a requirement to convert rows into columns. data looks like: c1,c2,c3,.. r1,r2,r3,.. p1,p2,p3,.. and so on.. output shud be like this: c1,r1,p1,.. c2,r2,p2,.. c3,r3,p3,.. Thanks in advance, (12 Replies)
Discussion started by: Divya1987
12 Replies

4. Shell Programming and Scripting

Help with converting XML to Flat file

Hi Friends, I want to convert a XML file to flat file. Sample I/p: <?xml version='1.0' encoding='UTF-8' ?> <DataFile xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' contactCount='4999' date='2012-04-14' time='22:00:14' xsi:noNamespaceSchemaLocation='gen .xsd'> <Contact... (3 Replies)
Discussion started by: karumudi7
3 Replies

5. Shell Programming and Scripting

need to save the space when converting to CSV file

Hi, I have a text file with the following format. Some of the fields are blank. 1234 3456 23 45464 327837283232 343434 5654353 34 34343 3434345 434242 .... .... .... I need to convert this file to a CSV file, like 1234, ,23, ... (3 Replies)
Discussion started by: wintersnow2011
3 Replies

6. Shell Programming and Scripting

Converting specific Excel file tabs to CSV in Python

Hi list, This is probably something really simple, but I am not particularly familiar with Python so I thought I would ask as I know that python has an excel module. I have an excel document with multiple tabs of data and graphs. One of the tabs is just data which I require to have dumped to... (8 Replies)
Discussion started by: landossa
8 Replies

7. Shell Programming and Scripting

convert huge .xml file in .csv with specific column.

I have huge xml file in server and i want to convert it to .csv with specific column ... i have search in blog but i didn't get any usefully command. Thanks in advance (1 Reply)
Discussion started by: pareshkp
1 Replies

8. Shell Programming and Scripting

Converting a flat file in XML

Hello Friends, I am new to UNIX shell scripting. Using bash....Could you please help me in converting a flat file into an XML style output file. Flat file: (Input File entries looks like this) John Miller: 617-569-7996:15 Bunting lane, staten Island, NY: 10/21/79: 60600 The... (4 Replies)
Discussion started by: humkhn
4 Replies

9. Shell Programming and Scripting

XML to CSV specific

Hi , Please any one to help on ,extract this xml code into csv columns list. <SOURCEFIELD BUSINESSNAME ="" DATATYPE ="date" DESCRIPTION ="" FIELDNUMBER ="1" FIELDPROPERTY ="0" FIELDTYPE ="ELEMITEM" HIDDEN ="NO" KEYTYPE ="NOT A KEY" LENGTH ="19" LEVEL ="0" NAME ="BUSINESS_DATE"... (4 Replies)
Discussion started by: mohan705
4 Replies

10. Shell Programming and Scripting

Converting txt file in csv

HI All, I have a text file memory.txt which has following values. Average: 822387 7346605 89.93 288845 4176593 2044589 51883 2.47 7600 i want to convert this file in csv format and i am using following command to do it. sed s/_/\./g <... (3 Replies)
Discussion started by: mkashif
3 Replies
Login or Register to Ask a Question