Sponsored Content
Full Discussion: XML Parsing using awk
Top Forums Shell Programming and Scripting XML Parsing using awk Post 302734137 by Corona688 on Wednesday 21st of November 2012 04:25:03 PM
Old 11-21-2012
Parsing XML is not trivial.

Because of frequent requests for xml to flatfile conversion, I've got a script that works in some common situations however.

Code:
$ cat xmlh.awk

BEGIN { RS="<";         FS=">";
        # Uncomment to make windows-readable text files
        # ORS="\r\n";

        # Change this to alter how many close-tags in a row are needed
        # before a row of data is printed.
        if(!DEP) DEP=1
        SEP="\t"
        }

# Skip weird XML specification lines or blank records
/^\?/ || /^$/   {       next    }

# Handle close tags
/^[/]/  {
        N=D;    while((N>0) && ("/"STACK[N] != $1))     N--;

        if("/"STACK[N] == $1)   D=(N-1);
        POP++;

        if(POP == DEP)
        {
                if(!HEADER++)
                {
                        split(ARG[1], Z, SUBSEP);
                        printf("%s %s", Z[2], Z[3]);
                        for(N=2; N<=ARG_; N++)
                        {
                                split(ARG[N], Z, SUBSEP);
                                printf("%s%s %s", SEP, Z[2], Z[3]);
                        }

                        printf("\n");
                }

                printf("%s", DATA[ARG[1]]);
                for(N=2; N<=ARG_; N++)
                        printf("%s%s", SEP, DATA[ARG[N]]);
                printf("\n");
        }
        next
}

# Handle open tags
{
        gsub(/^[ \r\n\t]*/, "", $2);    # Whitespace isn't data
        gsub(/[ \r\n\t]*$/, "", $2);
        sub(/\/$/, "", $(NF-1));

        # Reset parameters
        POP=0;

        M=split($1, A, " ");
        STACK[++D]=A[1];

        if((!MAX) || (D>MAX)) MAX=D;    # Save max depth

        # Handle parameters
        Q=split(A[2], B, " ");
        for(N=1; N<=Q; N++)
        {
                split(B[N], C, "=");
                gsub(/['"]/,"", C[2]);

                I=D SUBSEP STACK[D] SUBSEP C[1];
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=C[2];
        }

        if($2)
        {
                I=D SUBSEP STACK[D] SUBSEP "CDATA";
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=$2;
        }
}

$ awk -f xmlh.awk DEP=2 data3.xml

SeqNo CDATA     redcode CDATA   GenError CDATA
43156489079     SKNEQGGEVHW     Upload-Success
43156489079     SKNEQGGEVHW     Upload-Success

$

Output is tab-separated. DEP is how many close-tags in a row it looks for before printing a row of data.

Last edited by Corona688; 11-21-2012 at 05:30 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parsing XML dynamic data via awk?

I am trying to use a line of output in an XML file as input in another new XML file for processing purposes via a shell script. Since I am a newbie though, I'm not sure how to do this since the data is different everytime. I am using this technique with static data right now: echo -n "Running... (5 Replies)
Discussion started by: corwin43
5 Replies

2. Shell Programming and Scripting

parsing xml with awk/sed

Hi people!, I need extract from the file (test-file.txt) the values between <context> and </context> tag's , the total are 7 lines,but i can only get 5 or 2 lines!!:confused: Please look my code: #awk '/context/{flag=1} /\/context/{flag=0} !/context/{ if (flag==1) p rint $0; }'... (3 Replies)
Discussion started by: ricgamch
3 Replies

3. Shell Programming and Scripting

Parsing xml using awk - more help needed

As per another thread - https://www.unix.com/shell-programming-scripting/81027-how-can-i-parse-xml-file-2.html I am using the following to extract the Subaccid and RecAccTotal from the xm file below awk -v v=SubaccId -F'' '$2==v{s=$3;getline;a+=$3}END {for (i in a)print v,i,a}' file Can... (6 Replies)
Discussion started by: frustrated1
6 Replies

4. Shell Programming and Scripting

parsing xml using awk

hello , i am trying to parse xml using awk however its a little bit tricky as i want <databases> <source> <host>prod</host> <port>1522</port> <tns>GP1</tns> <user>P11</user>... (6 Replies)
Discussion started by: amit1_x
6 Replies

5. Shell Programming and Scripting

parsing(xml) using nawk/awk

Hi , I have an xml format as shown below: <Info> <last name="sean" first name="john"/> <period="5" time="11"/> <test value="1",test2 value="2",test3 value="3",test4 value="5"> <old> <value1>1</value1> <value2>2</value2> </old> <new> <value1>4</value1> <value2>3</value2> </new>... (1 Reply)
Discussion started by: natalie23
1 Replies

6. Shell Programming and Scripting

Parsing XML in awk : OFS does not work as expected

Hi, I am trying to parse regular XML file where I have to reduce number of decimal points in some xml elements. I am using following AWK command to achive that : #!/bin/ksh EDITCMD='BEGIN { FS = ""; OFS=FS } { if ( $3 ~ "*\\.*" && length(substr($3,1+index($3,"."))) == 15 ) {... (4 Replies)
Discussion started by: martin.franek
4 Replies

7. Shell Programming and Scripting

Help needed for parsing large XML with awk.

My XML structure looks like: <?xml version="1.0" encoding="UTF-8"?> <SearchRepository> <SearchItems> <SearchItem> ... </SearchItem> <SearchItem> ... ... (1 Reply)
Discussion started by: jasonjustice
1 Replies

8. Shell Programming and Scripting

xml parsing with awk

hi all.. need your help again.. i have xml file and i want to parsing some data from the xml file.. <ex-name="keroco"> <................> <................> <................> <br-name="cincai"> <ship="123456"> <...................> ... (3 Replies)
Discussion started by: buncit8
3 Replies

9. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

10. Shell Programming and Scripting

Multiple command execution inside awk command during xml parsing

below is the output xml string from some other command and i will be parsing it using awk cat /tmp/alerts.xml <Alert id="10102" name="APP-DS-ds_ha-140018-componentFailure-S" alertDefinitionId="13982" resourceId="11427" ctime="1359453507621" fixed="false" reason="If Event/Log Level(ANY) and... (2 Replies)
Discussion started by: vivek d r
2 Replies
XML::Handler::CanonXMLWriter(3) 			User Contributed Perl Documentation			   XML::Handler::CanonXMLWriter(3)

NAME
XML::Handler::CanonXMLWriter - output XML in canonical XML format SYNOPSIS
use XML::Handler::CanonXMLWriter; $writer = XML::Handler::CanonXMLWriter OPTIONS; $parser->parse(Handler => $writer); DESCRIPTION
"XML::Handler::CanonXMLWriter" is a PerlSAX handler that will return a string or write a stream of canonical XML for an XML instance and it's content. "XML::Handler::CanonXMLWriter" objects hold the options used for writing the XML objects. Options can be supplied when the the object is created, $writer = new XML::Handler::CanonXMLWriter PrintComments => 1; or modified at any time before calling the parser's `"parse()"' method: $writer->{PrintComments} = 0; OPTIONS
IOHandle IOHandle contains a handle for writing the canonical XML to. If an IOHandle is not provided, the canonical XML string will be returned from `"parse()"'. PrintComments By default comments are not written to the output. Setting comment to a true value will include comments in the output. AUTHOR
Ken MacLeod, ken@bitsko.slc.ut.us SEE ALSO
perl(1), PerlSAX James Clark's Canonical XML definition <http://www.jclark.com/xml/canonxml.html> perl v5.12.1 2003-10-21 XML::Handler::CanonXMLWriter(3)
All times are GMT -4. The time now is 07:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy