Using shell command need to parse multiple nested tag value of a XML file


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using shell command need to parse multiple nested tag value of a XML file
# 1  
Linux Using shell command need to parse multiple nested tag value of a XML file

I have this XML file -

Code:
<gp>
    <mms>1110012</mms>
    <tg>988</tg>
    <mm>LongTime</mm>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>12</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>2</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
<gp>

<gp>
    <mms>2221100</mms>
    <tg>989</tg>
    <mm>LongVelocity</mm>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>772</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>900</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
<gp>

Now, I need to first search for "<mm>LongTime</mm>", if found then I have to find for "Desti = Motion" value (which is inside <lkid>StartEle=ONE, Desti = Motion</lkid>) inside the multiple nested sub-tags ... and if that is also found then I finally have to get the value inside the TAG below, which is 12 (<kk>12</kk>).
Please help, using anything - AWK, SED, Grep, anything will do.
Thanks in advance.

Last edited by bartus11; 01-09-2014 at 09:39 AM.. Reason: Please use code tags.
# 2  
You didn't post expected output. So based on some assumptions:
Code:
awk -F'[<>]' '
        /<mm>LongTime<\/mm>/ {
                f = 1
        }
        f && /<lkid>/ {
                print $3
                getline
                print $3
                f = 0
        }
' file.xml

Use nawk instead in SunOS/Solaris. Modify as per your requirement.
# 3  
Your XML is weird. Is it supposed to have two <gp> tags in a row like that, instead of <gp> ... </gp> ?

If the missing </gp> was a mistake, I have a generic awk script to extract XML tags:

Code:
$ cat xmlh.awk

BEGIN { RS="<";         FS=">"; ORS="\r\n";

        # Change this to alter how many close-tags in a row are needed
        # before a row of data is printed.
        if(!DEP) DEP=1
        SEP="\t"
        }

# Skip weird XML specification lines or blank records
/^\?/ || /^$/   {       next    }

# Handle close tags
/^[/]/  {
        N=D;    while((N>0) && ("/"STACK[N] != $1))     N--;

        if("/"STACK[N] == $1)   D=(N-1);
        POP++;

        if(POP == DEP)
        {
                if(!HEADER++)
                {
                        split(ARG[1], Z, SUBSEP);
                        printf("%s %s", Z[2], Z[3]);
                        for(N=2; N<=ARG_; N++)
                        {
                                split(ARG[N], Z, SUBSEP);
                                printf("%s%s %s", SEP, Z[2], Z[3]);
                        }

                        printf("\n");
                }

                printf("%s", DATA[ARG[1]]);
                for(N=2; N<=ARG_; N++)
                        printf("%s%s", SEP, DATA[ARG[N]]);
                printf("\n");
        }
        next
}

# Handle open tags
{
        gsub(/^[ \r\n\t]*/, "", $2);    # Whitespace isn't data
        gsub(/[ \r\n\t]*$/, "", $2);
        sub(/\/$/, "", $(NF-1));

        # Reset parameters
        POP=0;

        M=split($1, A, " ");
        STACK[++D]=A[1];

        if((!MAX) || (D>MAX)) MAX=D;    # Save max depth

        # Handle parameters
        Q=split(A[2], B, " ");
        for(N=1; N<=Q; N++)
        {
                split(B[N], C, "=");
                gsub(/['"]/,"", C[2]);

                I=D SUBSEP STACK[D] SUBSEP C[1];
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=C[2];
        }

        if($2)
        {
                I=D SUBSEP STACK[D] SUBSEP "CDATA";
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=$2;
        }
}

$ cat data4.xml

<gp>
    <mms>1110012</mms>
    <tg>988</tg>
    <mm>LongTime</mm>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>12</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>2</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
</gp>

<gp>
    <mms>2221100</mms>
    <tg>989</tg>
    <mm>LongVelocity</mm>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>772</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>900</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
</gp>

$ # DEP=2 means 'wait for 2 close tags in a row before printing a row'.
$ awk -f xmlh.awk DEP=2 data4.xml
mms CDATA       tg CDATA        mm CDATA        lkid CDATA      kk CDATA
1110012 988     LongTime        StartEle=ONE, Desti = Motion    12
1110012 988     LongTime        StartEle=ONE, Source = Velocity 2
1110012 988     LongTime        StartEle=ONE, Source = Park     2
2221100 989     LongVelocity    StartEle=ONE, Source = Velocity 772
2221100 989     LongVelocity    StartEle=ONE, Desti = Motion    900
2221100 989     LongVelocity    StartEle=ONE, Source = Park     2

$

Output is tab-separated.

Use nawk on solaris.

Last edited by Corona688; 01-09-2014 at 01:02 PM..
# 4  
You may try, might not be intelligent as one Corona posted, but for given sample it works.
Code:
awk '
    !f && /<(.*)>$/{
                     s=$0; f=g=1
                     gsub(/</,"</",s)
                     next
                   }
                 f{
                     g=(/<(.*)>/ && !/<\/(.*)>$/)?0:g
                     if(/\<(.*)>(.*)<\/(.*)>/){
                                                  gsub(/[<>]|<\//,"|")
                                                  split($0,A,"|")
                                                  com = g == 1 ? com ? com OFS A[3] : A[3] : com
                                                  dat = g == 0 ? dat ? dat OFS A[3] : A[3] : dat 
                                                  hea = m ? hea : hea ? hea OFS A[2]"_CDATA" : A[2]"_CDATA"
                                              }
                      if(/<\/(.*)>$/ && $0!~s){
                                             ++d
                                             if(d==count){
                                                           print m ? com OFS dat : hea RS com OFS dat
                                                           dat="";d=0;m=1
                                                         }  
                                              }
                   }
               $0~s{
                       f=s=com=d=""
                   }
     ' count="1" OFS="\t" test.xml

Code:
$ cat test.xml
<gp>
    <mms>1110012</mms>
    <tg>988</tg>
    <mm>LongTime</mm>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>12</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>2</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
</gp>

<gp>
    <mms>2221100</mms>
    <tg>989</tg>
    <mm>LongVelocity</mm>
    <lv>
        <lkid>StartEle=ONE, Source = Velocity</lkid>
        <kk>772</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Desti = Motion</lkid>
        <kk>900</kk>
    </lv>
    <lv>
        <lkid>StartEle=ONE, Source = Park</lkid>
        <kk>2</kk>
    </lv>
</gp>

Code:
mms_CDATA  tg_CDATA  mm_CDATA       lkid_CDATA            kk_CDATA
1110012    988       LongTime        StartEle=ONE, Desti = Motion       12
1110012    988       LongTime        StartEle=ONE, Source = Velocity    2
1110012    988       LongTime        StartEle=ONE, Source = Park        2
2221100    989       LongVelocity    StartEle=ONE, Source = Velocity    772
2221100    989       LongVelocity    StartEle=ONE, Desti = Motion       900
2221100    989       LongVelocity    StartEle=ONE, Source = Park        2

Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #323
Difficulty: Easy
In most programming languages, the operator ++ is equivalent to the statement -= 1;.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

XML files with spaces in the tag name, parse & display?

Greetings all, I have an XML file that is being generated from my application, here is a sample of the first tag (That I am trying to remove and display in a list..) Example- <tag one= "data" data="1234" updateTime="1300"> <tag one= "data1" data="1234" updateTime="1300"> <tag... (5 Replies)
Discussion started by: jeffs42885
5 Replies

4. Shell Programming and Scripting

Creating multiple xml tag lines in a file

Hi All, Can someone tell me how can we create same xml tag lines based on the number of lines present in other file and replace the Name variable vaule present in other file. basically I have this xml line <typ:RequestKey NameType="RIC" Name="A1" Service="DDA"/> and say I... (4 Replies)
Discussion started by: Optimus81
4 Replies

5. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

6. Emergency UNIX and Linux Support

Trying to parse a xml file for only one tag

I have a xml file in where I need to parse only a particular tag and print the output in the shell script. Here is the tag info in the xml file <dp:file> This is dp file output </dp:file> Output should be printed as This is dp file output. Please help.Thank you. (5 Replies)
Discussion started by: chandu123
5 Replies

7. Shell Programming and Scripting

awk Script to parse a XML tag

I have an XML tag like this: <property name="agent" value="/var/tmp/root/eclipse" /> Is there way using awk that i can get the value from the above tag. So the output should be: /var/tmp/root/eclipse Help will be appreciated. Regards, Adi (6 Replies)
Discussion started by: asirohi
6 Replies

8. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

9. Shell Programming and Scripting

shell command to remove some XML tag is needed

Hi all, I have a file which i have to remove some line from it, the lines that i have to remove from my file is as below: </new_name></w"s" langue="Fr-fr" version="1.0" encoding="UTF-8" ?> <New_name> and it is finding at the middle of my file, is there any command line in linux to do it or do... (10 Replies)
Discussion started by: id_2pc
10 Replies

10. Shell Programming and Scripting

big xml file with nested loop parse

I have an xml file with the structure: <tag1> <value1>xyx</value1> <value2>123</value2> </tag1> <tag1> <value1>568</value1> <value2>zzzzz</value2> </tag1> where I want to parse each data pair in the this single file, so something like: find first tag1 data pair... (1 Reply)
Discussion started by: unclecameron
1 Replies

Featured Tech Videos