Convert XML to CSV using awk or shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Convert XML to CSV using awk or shell script
# 8  
Old 01-12-2015
Exactly. We could spend hours doing what you want, and get it thrown back in our faces with "thank you, but what I actually wanted it to look like is Y". We don't know Y. You think we should know what Y is implicitly, but there's actually lots of choices.
# 9  
Old 01-13-2015
Here is the sample output:

Code:
Partner,OrderType,OrderNumber,OrderSource,OrderDate,Line1,Line2,City,State,PostalCode,CountryCode,Name,NumberOfItems,Name2,Line13,Line24,City5,State6,PostalCode7,CountryCode8,Method,tag,Name9,Quantity,UnitPrice,Eligible,OrderStatus,SKU
TTTT,test,1000000000,,11/14/2014 12:00:00 AM,XXXX,,stsss,gg,101010,aaaaa,mmmmmm,3,mmmmm,abcd,,xyz,sjsdjhi,101010,kkkkkk,test,False,Item1,3,15.99,False,test,5-100000
TTTT,test,1000000000,,11/14/2014 12:00:00 AM,XXXX,,stsss,gg,101010,aaaaa,mmmmmm,1,mmmmm,abcd,,xyz,sjsdjhi,101010,kkkkkk,test,False,Item2,1,10.49,True,test,5-100001

The header is not required it is just for your reference. When I meant tags I meant all data Tags. For ex. <OrderSource/> is a single close tag which has no data for now but it can have values. So single close tags should have a null value in the csv.
# 10  
Old 01-13-2015
perl
Code:
use XML::XPath;
use XML::XPath::XMLParser;

my $xpath=XML::XPath->new(filename  =>  "/path/tofile/order.xml");

my $nodelist=$xpath->findnodes("//Orders");
foreach my $node ($nodelist->get_nodelist) {
  ($line=$node->string_value)=~s/\n/,/g;
  print $line,"\n";
}

# 11  
Old 01-13-2015
Thanks for the reply; but I do not know much about perl. In the current project we are supposed to use either Bash or awk script.
# 12  
Old 01-13-2015
here you go ....
Code:
awk '{ if ( $0 == "<Orders>") { printf "\n";next}  match($0, /\>.*\</); if (RLENGTH > 0) { printf substr($0,RSTART+1,RLENGTH-3)","; next }
 match($0, /\/>/) ; if (RLENGTH > 0) { printf "," ;next } } END {print "" } ' order.xml

# 13  
Old 01-13-2015
The perl fails on my system with this:

Code:
Can't locate XML/XPath.pm in @INC (you may need to install the XML::XPath module) (@INC contains: /etc/perl /usr/local/lib64/perl5/5.18.2/x86_64-linux /usr/local/lib64/perl5/5.18.2 /usr/lib64/perl5/vendor_perl/5.18.2/x86_64-linux /usr/lib64/perl5/vendor_perl/5.18.2 /usr/local/lib64/perl5 /usr/lib64/perl5/vendor_perl /usr/lib64/perl5/5.18.2/x86_64-linux /usr/lib64/perl5/5.18.2 .) at ./xmlpath.pl line 3.
BEGIN failed--compilation aborted at ./xmlpath.pl line 3.

So please, try again without using nonstandard modules.
# 14  
Old 01-13-2015
So, you want a line for each item.

Thank you.

Code:
$ cat xml.awk

BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

# These should be special variables for match() but aren't.
function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
match($1, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS);
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        if(DEP < 0) DEP=0;
}

$ cat order.awk

{
        sub(/\/$/, "", $1);
        sub(/^[ \r\n\t]*/, "", $2);
        sub(/[\ \r\n\t]*$/, "", $2);
}

# We are inside <order>, and not at a close-tag
(TAGS ~ /%ORDERS($|%)/) && !/^\// {
        if(!($1 in O)) { O[++L]=$1 ; O[$1]=L }
        D[$1]=$2
}

/\/Item/ {
        P=""
        for(N=1; N<=L; N++) {
                printf("%s%s", P, D[O[N]]); P=OFS;
        }

        print ""
}

$ awk -f xml.awk -f order.awk OFS="," ORS="\n" order.xml

TTTT,,test,1000000000,,11/14/2014 12:00:00 AM,,abcd,,xyz,sjsdjhi,101010,kkkkkk,Item1,,1,test,False,,3,15.99,False,test,5-100000
TTTT,,test,1000000000,,11/14/2014 12:00:00 AM,,abcd,,xyz,sjsdjhi,101010,kkkkkk,Item2,,1,test,False,,1,10.49,True,test,5-100001

$

It's not small, but XML is not trivial and this doesn't depend on external modules to do the work.
This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Covert xml to csv using xsltproc in shell script

Hi, I am not familiar with shell scripting. Please help to convert xml files to csv files using xsltproc command in bash script. Find the xml sample <?xml version="1.0"?> -<Source xmlns="link"> <CompanyCd>DSP</CompanyCd> <SrcSysCd>DSPS</SrcSysCd> <CountryCd>45</CountryCd> ... (2 Replies)
Discussion started by: mathina
2 Replies

2. Shell Programming and Scripting

How to convert xml to csv ?

I am in need of converting billions of XML into csv file to load data to DB, i have found the below code in perl but not sure why it's not working properly. CODE: #!/usr/bin/perl # Script to illustrate how to parse a simple XML file # and pick out all the values for a specific element, in... (1 Reply)
Discussion started by: rspwilliam
1 Replies

3. Shell Programming and Scripting

Convert xml to csv

I need to convert below xml code to csv. I searched other posts as well but this post (_https://www.unix.com/shell-programming-scripting/174417-extract-parse-xml-data-statistic-value-csv.html) gives "sed command garbled" error. As of now I have written a long script to do it, but can it be done with... (7 Replies)
Discussion started by: dineshydv
7 Replies

4. Shell Programming and Scripting

awk convert xml to csv

Hi, I have an xml file and I want to convert it with awk in to a csv file Test.xml <Worksheet ss:Name="Map1"> <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="2" x:FullColumns="1" x:FullRows="1" ss:DefaultColumnWidth="60"> <Row> <Cell><Data... (6 Replies)
Discussion started by: research3
6 Replies

5. Shell Programming and Scripting

Convert XML to Data File in Shell Script

Hi All, I will be getting a huge XML file with a lot of records in it. I need to convert it into multiple data files. SAMPLE XML FILE <ABSProductCatalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> - <ProductSalesHierachy> - <Portfolios> - <Portfolio productCode="P1"> ... (8 Replies)
Discussion started by: ragha81
8 Replies

6. Shell Programming and Scripting

Awk script to convert csv to html

Hi Written some script to convert csv to html but could not add table headers.Below are the errors iam getting ./csv2html | more + awk -v border=1 -v width=10 -v bgcolor=black -v fgcolor=white BEGIN { printf("<table border=\"%d\" bordercolor=\"%s\" width=\"%d\"... (2 Replies)
Discussion started by: zeebala1981
2 Replies

7. Shell Programming and Scripting

Convert XML to CSV format

Can any one give the idea on this, please. I have the following XML file and wants to convert into CSV(comma separated value) format. <?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE Waveset PUBLIC 'waveset.dtd' 'waveset.dtd'> <Waveset> <Object name='ra8736'> <Attribute name='ADDRESS'... (2 Replies)
Discussion started by: kumar04
2 Replies

8. Shell Programming and Scripting

convert this into csv using awk/shell script

Hi Scripting gurus, I need to convert following text snippet into csv. please help Input heading1 = data1 heading2 = data2 .. .. heading n = data n heading 1 = data1 .. .. Output data1,data2,....,data n (3 Replies)
Discussion started by: azs0309
3 Replies

9. Shell Programming and Scripting

Help to convert XML to CSV

Apologies if this has already been covered in this site somewhere, I did try looking but without any success. I am new to the whole XML thing, very late starter, and have a requirement to convert an XML fiule to a CSV fomat. I am crrently working on a Solaris OS. Does anyone have any suggestions,... (2 Replies)
Discussion started by: rossingi_33
2 Replies

10. Shell Programming and Scripting

Sample Unix script file to convert .xml to .csv

Dear all, Can you send me a script file the changes .xml to .csv file. Thanks, Srinivasa (4 Replies)
Discussion started by: srinivasaphani
4 Replies
Login or Register to Ask a Question