Convert XML to CSV using awk or shell script

01-12-2015

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Exactly. We could spend hours doing what you want, and get it thrown back in our faces with "thank you, but what I actually wanted it to look like is Y". We don't know Y. You think we should know what Y is implicitly, but there's actually lots of choices.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

01-13-2015

Registered User

4, 0

Join Date: Jan 2015

Last Activity: 19 January 2015, 2:39 AM EST

Location: Bangalore

Posts: 4

Thanks Given: 1

Thanked 0 Times in 0 Posts

Here is the sample output:

Code:

Partner,OrderType,OrderNumber,OrderSource,OrderDate,Line1,Line2,City,State,PostalCode,CountryCode,Name,NumberOfItems,Name2,Line13,Line24,City5,State6,PostalCode7,CountryCode8,Method,tag,Name9,Quantity,UnitPrice,Eligible,OrderStatus,SKU
TTTT,test,1000000000,,11/14/2014 12:00:00 AM,XXXX,,stsss,gg,101010,aaaaa,mmmmmm,3,mmmmm,abcd,,xyz,sjsdjhi,101010,kkkkkk,test,False,Item1,3,15.99,False,test,5-100000
TTTT,test,1000000000,,11/14/2014 12:00:00 AM,XXXX,,stsss,gg,101010,aaaaa,mmmmmm,1,mmmmm,abcd,,xyz,sjsdjhi,101010,kkkkkk,test,False,Item2,1,10.49,True,test,5-100001

The header is not required it is just for your reference. When I meant tags I meant all data Tags. For ex. <OrderSource/> is a single close tag which has no data for now but it can have values. So single close tags should have a null value in the csv.

Rashmitha

View Public Profile for Rashmitha

Find all posts by Rashmitha

01-13-2015

Registered User

1,271, 299

Join Date: Sep 2009

Last Activity: 17 July 2019, 5:46 PM EDT

Location: ./India/Bangalore

Posts: 1,271

Thanks Given: 70

Thanked 299 Times in 290 Posts

perl

Code:

use XML::XPath;
use XML::XPath::XMLParser;

my $xpath=XML::XPath->new(filename  =>  "/path/tofile/order.xml");

my $nodelist=$xpath->findnodes("//Orders");
foreach my $node ($nodelist->get_nodelist) {
  ($line=$node->string_value)=~s/\n/,/g;
  print $line,"\n";
}

pravin27

View Public Profile for pravin27

Find all posts by pravin27

01-13-2015

Registered User

4, 0

Join Date: Jan 2015

Last Activity: 19 January 2015, 2:39 AM EST

Location: Bangalore

Posts: 4

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks for the reply; but I do not know much about perl. In the current project we are supposed to use either Bash or awk script.

Rashmitha

View Public Profile for Rashmitha

Find all posts by Rashmitha

01-13-2015

Registered User

1,271, 299

Join Date: Sep 2009

Last Activity: 17 July 2019, 5:46 PM EDT

Location: ./India/Bangalore

Posts: 1,271

Thanks Given: 70

Thanked 299 Times in 290 Posts

here you go ....

Code:

awk '{ if ( $0 == "<Orders>") { printf "\n";next}  match($0, /\>.*\</); if (RLENGTH > 0) { printf substr($0,RSTART+1,RLENGTH-3)","; next }
 match($0, /\/>/) ; if (RLENGTH > 0) { printf "," ;next } } END {print "" } ' order.xml

pravin27

View Public Profile for pravin27

Find all posts by pravin27

01-13-2015

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

The perl fails on my system with this:

Code:

Can't locate XML/XPath.pm in @INC (you may need to install the XML::XPath module) (@INC contains: /etc/perl /usr/local/lib64/perl5/5.18.2/x86_64-linux /usr/local/lib64/perl5/5.18.2 /usr/lib64/perl5/vendor_perl/5.18.2/x86_64-linux /usr/lib64/perl5/vendor_perl/5.18.2 /usr/local/lib64/perl5 /usr/lib64/perl5/vendor_perl /usr/lib64/perl5/5.18.2/x86_64-linux /usr/lib64/perl5/5.18.2 .) at ./xmlpath.pl line 3.
BEGIN failed--compilation aborted at ./xmlpath.pl line 3.

So please, try again without using nonstandard modules.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

01-13-2015

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

So, you want a line for each item.

Thank you.

Code:

$ cat xml.awk

BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

# These should be special variables for match() but aren't.
function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
match($1, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS);
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        if(DEP < 0) DEP=0;
}

$ cat order.awk

{
        sub(/\/$/, "", $1);
        sub(/^[ \r\n\t]*/, "", $2);
        sub(/[\ \r\n\t]*$/, "", $2);
}

# We are inside <order>, and not at a close-tag
(TAGS ~ /%ORDERS($|%)/) && !/^\// {
        if(!($1 in O)) { O[++L]=$1 ; O[$1]=L }
        D[$1]=$2
}

/\/Item/ {
        P=""
        for(N=1; N<=L; N++) {
                printf("%s%s", P, D[O[N]]); P=OFS;
        }

        print ""
}

$ awk -f xml.awk -f order.awk OFS="," ORS="\n" order.xml

TTTT,,test,1000000000,,11/14/2014 12:00:00 AM,,abcd,,xyz,sjsdjhi,101010,kkkkkk,Item1,,1,test,False,,3,15.99,False,test,5-100000
TTTT,,test,1000000000,,11/14/2014 12:00:00 AM,,abcd,,xyz,sjsdjhi,101010,kkkkkk,Item2,,1,test,False,,1,10.49,True,test,5-100001

$

It's not small, but XML is not trivial and this doesn't depend on external modules to do the work.

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

Shell Programming and Scripting

Convert XML to CSV using awk or shell script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Covert xml to csv using xsltproc in shell script

Discussion started by: mathina

2. Shell Programming and Scripting

How to convert xml to csv ?

Discussion started by: rspwilliam

3. Shell Programming and Scripting

Convert xml to csv

Discussion started by: dineshydv

4. Shell Programming and Scripting

awk convert xml to csv

Discussion started by: research3

5. Shell Programming and Scripting

Convert XML to Data File in Shell Script

Discussion started by: ragha81

6. Shell Programming and Scripting

Awk script to convert csv to html

Discussion started by: zeebala1981

7. Shell Programming and Scripting

Convert XML to CSV format

Discussion started by: kumar04

8. Shell Programming and Scripting

convert this into csv using awk/shell script

Discussion started by: azs0309

9. Shell Programming and Scripting

Help to convert XML to CSV

Discussion started by: rossingi_33

10. Shell Programming and Scripting

Sample Unix script file to convert .xml to .csv

Discussion started by: srinivasaphani