Sponsored Content
Top Forums Shell Programming and Scripting XML Parse between to tag with upper tag Post 302893689 by Corona688 on Thursday 20th of March 2014 03:01:47 PM
Old 03-20-2014
An improved version of my generic xml-extraction awk program:

Code:
$ cat xmlt.awk

BEGIN {
        DEP=4;  # How many tags out to keep data
        POS=0   # Position in tag stack
        RS="<"; # Input record separator
        FS="[ \n\t\t>/]";       # Input field separator

        # Hardcode the first two things in the output order
        ORDER[++O]="XN:MECONTEXT:ID";
        ORDER["XN:MECONTEXT:ID"]=O

        ORDER[++O]="XN:VSDATACONTAINER:ID";
        ORDER["XN:VSDATACONTAINER:ID"]=O
}

# This function is checked on whether a property should be added to
# the list of what to print.
function catchthis(PROPNAME, PROPVAL) {

        # Catch all CDATA elements inside XN:VSDATACONTAINER tags
        if(TSS ~ /XN:VSDATACONTAINER/) return(PROPNAME ~ /DATA/);

        return((TSS ~ /XN:VSDATACONTAINER/) && (PROPNAME ~ /CDATA/));
}

# Always this finicky case when RS isn't \n
(NR==1) && (length($0) == 0) { next }

# Skip XML comments
/^!--/ {
        while(!(I=index($0, "-->"))) if(getline <= 0) exit;
        # Strip out comment
        $0="--XMLCOMMENT-- />"substr($0,I+3);
}

# Ignore XML specification junk
/^\?/ || /^\!/ { next }

# These should be special variables for match() but aren't.
# String before match
function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }
# First char of match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }
# Entire match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }
# String after match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }

# Turns Q SUBSEP R into A[PFIX":"Q]=R
function aquote(OUT, A, PFIX, TA) {
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);

                RMID=rmid(STR);
                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}

# Call before increment
function addprop(AIN,X,S) {
        for(X in AIN)
        {
                if(!(X in ORDER))
                if(catchthis(X, AIN[X]))
                {
                        ORDER[++O]=X
                        ORDER[X]=O
                }

                PROP[X]=AIN[X]
                KEEP[X]=(POS+2)-DEP
        }
}

# Call before decrement
function delprop(TA, N, M,X) {
        for(X in KEEP)
        if(KEEP[X] > POS)
        {
                delete PROP[X];
                delete KEEP[X];
        }
}

# Non-close tag
!/^\// {

        TAG=$1;                         sub(/^[^ \r\n\t>\/]*/, "");
        match($0, /\/?>/);
        TDATA=rbefore($0);              CDATA=rafter($0);

        # Flatten and strip whitespace
        gsub(/[ \r\n\t]+/, " ", CDATA);
        gsub(/^[ \r\n\t]+/, "", CDATA); gsub(/[ \r\n\t]+$/, "", CDATA);

        for(X in TA) delete TA[X];
        qsplit(TDATA, TA, TAG);
        if(length(CDATA))
                TA[toupper(TAG)":""CDATA"]=CDATA

        addprop(TA);

        if(RLENGTH != 2) # Found > instead of self-closing />
        {
                TS[++POS]=toupper(TAG);
                TSS=TSS"/"toupper(TAG);
        }



#       for(X in A) printf("%s[%s]=%s\n", TAG, X, A[X]);
}

# Close tags
/^\// {

        for(TPOS=POS; (TPOS>0) && (toupper($2) != TS[TPOS]); TPOS--);

        if(toupper($2) == "XN:VSDATACONTAINER")
        {
                OUT=""
                PFIX=""
                for(N=1; N<=O; N++)
                {
                        if(!PROP[ORDER[N]]) PROP[ORDER[N]]="!"ORDER[N]
                        OUT=OUT PFIX PROP[ORDER[N]];
                        PFIX=OFS
                }
                print OUT;
        }

        if(TPOS <= 0) print "Went under for "$2" pos="POS
        else
        {
                TPOS--;
                while(TPOS < POS)
                {
                        delprop();
                        sub(/\/[^\/]*$/, "", TSS); POS--;
                }
        }
}

$ awk -f xmlt.awk OFS="\t" Enodeb_MO_Export_10_47.xml
CCL01736        1       vsDataENodeBFunction    EricssonSpecificAttributes.13.25CCL01736        SubNetwork=ONRM_ROOT_MO_R,MeContext=CCL01736,ManagedElement=1,vsDataTransportNetwork=1,vsDataSctp=1     0       32      1       0       310    410      3       30      1440    30      true    SubNetwork=ONRM_ROOT_MO_R,MeContext=CCL01736,ManagedElement=1,vsDataIpSystem=1,vsDataIpAccessHostEt=1   false  1true    true    1       -2000000000     -2000000000     -2000000000     -2000000000     100     true    0       false

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

2. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

3. Shell Programming and Scripting

awk Script to parse a XML tag

I have an XML tag like this: <property name="agent" value="/var/tmp/root/eclipse" /> Is there way using awk that i can get the value from the above tag. So the output should be: /var/tmp/root/eclipse Help will be appreciated. Regards, Adi (6 Replies)
Discussion started by: asirohi
6 Replies

4. Emergency UNIX and Linux Support

Trying to parse a xml file for only one tag

I have a xml file in where I need to parse only a particular tag and print the output in the shell script. Here is the tag info in the xml file <dp:file> This is dp file output </dp:file> Output should be printed as This is dp file output. Please help.Thank you. (5 Replies)
Discussion started by: chandu123
5 Replies

5. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

6. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

7. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

8. Shell Programming and Scripting

XML files with spaces in the tag name, parse & display?

Greetings all, I have an XML file that is being generated from my application, here is a sample of the first tag (That I am trying to remove and display in a list..) Example- <tag one= "data" data="1234" updateTime="1300"> <tag one= "data1" data="1234" updateTime="1300"> <tag... (5 Replies)
Discussion started by: jeffs42885
5 Replies

9. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

10. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies
All times are GMT -4. The time now is 04:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy