XML Parsing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting XML Parsing using awk
# 8  
Old 11-21-2012
Parsing XML is not trivial.

Because of frequent requests for xml to flatfile conversion, I've got a script that works in some common situations however.

Code:
$ cat xmlh.awk

BEGIN { RS="<";         FS=">";
        # Uncomment to make windows-readable text files
        # ORS="\r\n";

        # Change this to alter how many close-tags in a row are needed
        # before a row of data is printed.
        if(!DEP) DEP=1
        SEP="\t"
        }

# Skip weird XML specification lines or blank records
/^\?/ || /^$/   {       next    }

# Handle close tags
/^[/]/  {
        N=D;    while((N>0) && ("/"STACK[N] != $1))     N--;

        if("/"STACK[N] == $1)   D=(N-1);
        POP++;

        if(POP == DEP)
        {
                if(!HEADER++)
                {
                        split(ARG[1], Z, SUBSEP);
                        printf("%s %s", Z[2], Z[3]);
                        for(N=2; N<=ARG_; N++)
                        {
                                split(ARG[N], Z, SUBSEP);
                                printf("%s%s %s", SEP, Z[2], Z[3]);
                        }

                        printf("\n");
                }

                printf("%s", DATA[ARG[1]]);
                for(N=2; N<=ARG_; N++)
                        printf("%s%s", SEP, DATA[ARG[N]]);
                printf("\n");
        }
        next
}

# Handle open tags
{
        gsub(/^[ \r\n\t]*/, "", $2);    # Whitespace isn't data
        gsub(/[ \r\n\t]*$/, "", $2);
        sub(/\/$/, "", $(NF-1));

        # Reset parameters
        POP=0;

        M=split($1, A, " ");
        STACK[++D]=A[1];

        if((!MAX) || (D>MAX)) MAX=D;    # Save max depth

        # Handle parameters
        Q=split(A[2], B, " ");
        for(N=1; N<=Q; N++)
        {
                split(B[N], C, "=");
                gsub(/['"]/,"", C[2]);

                I=D SUBSEP STACK[D] SUBSEP C[1];
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=C[2];
        }

        if($2)
        {
                I=D SUBSEP STACK[D] SUBSEP "CDATA";
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=$2;
        }
}

$ awk -f xmlh.awk DEP=2 data3.xml

SeqNo CDATA     redcode CDATA   GenError CDATA
43156489079     SKNEQGGEVHW     Upload-Success
43156489079     SKNEQGGEVHW     Upload-Success

$

Output is tab-separated. DEP is how many close-tags in a row it looks for before printing a row of data.

Last edited by Corona688; 11-21-2012 at 05:30 PM..
# 9  
Old 11-21-2012
Linux

Thanks ! I am looking for something by awk Smilie
# 10  
Old 11-21-2012
I think we crossposted. Does my solution above work for you? It's a generic xml-to-flatfile converter in awk which groups columns by itself.

It has some limitations. Spaces inside tag values are a problem. But it works for the data you gave as shown above.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple command execution inside awk command during xml parsing

below is the output xml string from some other command and i will be parsing it using awk cat /tmp/alerts.xml <Alert id="10102" name="APP-DS-ds_ha-140018-componentFailure-S" alertDefinitionId="13982" resourceId="11427" ctime="1359453507621" fixed="false" reason="If Event/Log Level(ANY) and... (2 Replies)
Discussion started by: vivek d r
2 Replies

2. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

3. Shell Programming and Scripting

xml parsing with awk

hi all.. need your help again.. i have xml file and i want to parsing some data from the xml file.. <ex-name="keroco"> <................> <................> <................> <br-name="cincai"> <ship="123456"> <...................> ... (3 Replies)
Discussion started by: buncit8
3 Replies

4. Shell Programming and Scripting

Help needed for parsing large XML with awk.

My XML structure looks like: <?xml version="1.0" encoding="UTF-8"?> <SearchRepository> <SearchItems> <SearchItem> ... </SearchItem> <SearchItem> ... ... (1 Reply)
Discussion started by: jasonjustice
1 Replies

5. Shell Programming and Scripting

Parsing XML in awk : OFS does not work as expected

Hi, I am trying to parse regular XML file where I have to reduce number of decimal points in some xml elements. I am using following AWK command to achive that : #!/bin/ksh EDITCMD='BEGIN { FS = ""; OFS=FS } { if ( $3 ~ "*\\.*" && length(substr($3,1+index($3,"."))) == 15 ) {... (4 Replies)
Discussion started by: martin.franek
4 Replies

6. Shell Programming and Scripting

parsing(xml) using nawk/awk

Hi , I have an xml format as shown below: <Info> <last name="sean" first name="john"/> <period="5" time="11"/> <test value="1",test2 value="2",test3 value="3",test4 value="5"> <old> <value1>1</value1> <value2>2</value2> </old> <new> <value1>4</value1> <value2>3</value2> </new>... (1 Reply)
Discussion started by: natalie23
1 Replies

7. Shell Programming and Scripting

parsing xml using awk

hello , i am trying to parse xml using awk however its a little bit tricky as i want <databases> <source> <host>prod</host> <port>1522</port> <tns>GP1</tns> <user>P11</user>... (6 Replies)
Discussion started by: amit1_x
6 Replies

8. Shell Programming and Scripting

Parsing xml using awk - more help needed

As per another thread - https://www.unix.com/shell-programming-scripting/81027-how-can-i-parse-xml-file-2.html I am using the following to extract the Subaccid and RecAccTotal from the xm file below awk -v v=SubaccId -F'' '$2==v{s=$3;getline;a+=$3}END {for (i in a)print v,i,a}' file Can... (6 Replies)
Discussion started by: frustrated1
6 Replies

9. Shell Programming and Scripting

parsing xml with awk/sed

Hi people!, I need extract from the file (test-file.txt) the values between <context> and </context> tag's , the total are 7 lines,but i can only get 5 or 2 lines!!:confused: Please look my code: #awk '/context/{flag=1} /\/context/{flag=0} !/context/{ if (flag==1) p rint $0; }'... (3 Replies)
Discussion started by: ricgamch
3 Replies

10. UNIX for Dummies Questions & Answers

Parsing XML dynamic data via awk?

I am trying to use a line of output in an XML file as input in another new XML file for processing purposes via a shell script. Since I am a newbie though, I'm not sure how to do this since the data is different everytime. I am using this technique with static data right now: echo -n "Running... (5 Replies)
Discussion started by: corwin43
5 Replies
Login or Register to Ask a Question