XML Phase with awk

12-10-2018

Registered User

128, 1

Join Date: Dec 2011

Last Activity: 9 March 2021, 12:35 PM EST

Posts: 128

Thanks Given: 60

Thanked 1 Time in 1 Post

XML Phase with awk

Hi Guys,

Input XML File :-

Code:

  <managedObject class="RMOD_R" distName="MRBTS-101/X/R-7">
   <list name="activeCellsList">
    <p>15</p>
    <p>201</p>
   </list>
   <p name="aldManagementProtocol">True</p>
   <p name="serialNumber">845</p>
  </managedObject>

Output :-

Code:

CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15;201,True

Have tired below but not getting two value from same tag <p>

Code:

awk -F'[\\""\\>\\<]' -v OFS=',' 'BEGIN{print "Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable"} /RMOD_R/{a=$5}/activeCellsList/{b=$3}/<p>/{c=$3}/antennaPathDelayMeasurementCapable/{print a,b,c,$5}' | sed 's/^/CLI_RMOD_ACL^/'

What I am getting from the above command

Code:

CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True

Last edited by vgersh99; 12-10-2018 at 01:44 PM.. Reason: fixed icode -> code tags

pareshkp

View Public Profile for pareshkp

Find all posts by pareshkp

12-10-2018

Registered User

211, 8

Join Date: Sep 2009

Last Activity: 12 December 2018, 2:49 PM EST

Location: America

Posts: 211

Thanks Given: 0

Thanked 8 Times in 7 Posts

Have tried with xmllint command easier to handle XML files.

You need an header line

Code:

CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable

and then

Code:

CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True

which is

CLI_RMOD_ACL (Static)
MRBTS-101/X/R-7 (attribute )
activeCellsList (static)
All values of list starting with tag p
True ( static)

So for part 2 using xpath in xmllint

Code:

echo 'cat //@distName' | xmllint --shell try.txt | grep -v '>' | awk -F=\" '{ print $2 }'

Does this work ?

chakrapani

View Public Profile for chakrapani

Find all posts by chakrapani

12-10-2018

Registered User

128, 1

Join Date: Dec 2011

Last Activity: 9 March 2021, 12:35 PM EST

Posts: 128

Thanks Given: 60

Thanked 1 Time in 1 Post

No Really !! -bash: xmllint: command not found

pareshkp

View Public Profile for pareshkp

Find all posts by pareshkp

12-10-2018

Banned

1, 0

Join Date: Dec 2018

Last Activity: 10 December 2018, 3:24 PM EST

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by chakrapani

Have tried with xmllint command easier to handle XML files.

You need an header line

Code:

CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable

and then

Code:

CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True

which is

CLI_RMOD_ACL (Static)
MRBTS-101/X/R-7 (attribute )
activeCellsList (static)
All values of list starting with tag p
True ( static)

So for part 2 using xpath in xmllint

Code:

echo 'cat //@distName' | xmllint --shell try.txt | grep -v '>' | awk -F=\" '{ print $2 }'

Does this work ?

YEAH I AGREE

KOUKIPILO

View Public Profile for KOUKIPILO

Find all posts by KOUKIPILO

12-10-2018

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

I'm not getting anything (but the header) with your awk code, but try this:

Code:

 awk -F'["<>]' -v OFS=',' '
   BEGIN{print "Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable"} 
   /RMOD_R/{a=$(NF-2)}
   $2~/list name/{b=$3}
   $2=="p"{c=(c)?c";"$3:$3}
   /aldManagementProtocol/{print a,b,c,$5;a=b=c=""}' myXMLfile | sed 's/^/CLI_RMOD_ACL^/'

Last edited by vgersh99; 12-10-2018 at 04:09 PM.. Reason: init abc

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

12-11-2018

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Handling XML even slightly properly isn't trivial. But we get asked for it a lot, so:

Code:

# yanx.awk v0.0.8, Tyler Montbriand, 2017.  Yet another noncompliant XML parser
###############################################################################
# XML is a pain to process in the shell, but people need it all the time.
# I've been using and improving this kludge since 2014 or so.  It parses and
# stacks tags and digests parameters, allowing simple XML processing and
# extraction to be managed with a handful of lines addendum.
#
# I've restricted my use of GNU features enough that this script will run on
# busybox's awk.  I think it works with mawk except -e is unsupported.
# You can work around that by running multiple files, i.e.
# mawk -f yanx.awk -f mystuff.awk inputfile
###############################################################################
# Basic use:
#
# Fed this XML, <body><html a="b">Your Web Browser Hates This</html></body>
# yanx will read it token-by-token as so:
#     Line 1:  Empty, skipped
#     Line 2:  $1="body"
#     Line 3:  $1="html a="b"", $2="Your web browser hates this"
#     Line 4:  $1="/html"
#     Line 5:  $1="/body", $2="\n"
#
# The script sets a few new "special" variables along the way.
# TAG           The name of the current tag, uppercased.
# CTAG          If close-tag, name in uppercase.
# TAGS          List of nested tags, like HTML%BODY%, including current tag
# LTAGS         List of nested tags, not including current tag
# ARGS          Array of tag parameters, uppercased.  i.e. ARGS["HREF"]
# DEP           How many tags deep it's nested, including current tag.
#
###############################################################################
# Examples:
# # Rewrite cdata of all divs
# awk -f yanx.awk -e 'TAGS ~ /^DIV%/ { $2="quux froob" } 1' input
# # Extract href's from every link
# awk -f yanx.awk -e 'TAGS~/^A%/ && ("HREF" in ARGS) {
#       print ARGS["HREF"] }' ORS="\n" input
###############################################################################
# Known Bugs:
# A short XML script can't possibly handle DOD, etc.  Entities a la &lt;
# are not translated either.
#
# I've done my best to make it swallow <!--, <? ?> and other such fancy
# XML syntax without choking, but that doesn't mean it handles them
# properly either.
#
# It's an XML parser, not an HTML parser.  It probably won't swallow a
# wild-from-the internet HTML web page without some cleanup first:
# javascript, tags inside comments, etc will be mangled instead of ignored.
#
# Last: Because of its design, when printing raw HTML, yanx adds an extra <
# to the end of the file.  This is because < belongs at the beginning of
# a token but awk is told it's printed at the end.  There is no equivalent
# "line prefix" variable that I know of, if you want it to print smarter
# you'll have to print the <'s yourself, by setting ORS=" and
# printing lines like print "<" $0
###############################################################################
BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

# After match("qwertyuiop", /rty/)
#       rbefore("qwertyuiop") is "qwe",
#       rmid("qwertyuipo")    is "r"
#       rall("qwertyuiop")    is "rty"
#       rafter("qwertyuiop")  is "uiop"

# !?!?!
# function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rbefore(STR)   { return(substr(STR, 0, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
(!SPEC) && match($1, /^[^\/ \r\n\t>]+/) {
        CTAG=""
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS, "", "", "");
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
        CTAG=toupper($1)
        TAG=""
#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        # Update TAG with tag on top of stack, if any
#       if(DEP < 0) {   DEP=0;  TAG=""  }
#       else { TAG=TA[DEP]; }
}

Using this, we can build a solution out of awk:

Code:

# managed.awk
BEGIN { ORS="\n" }
TAG=="MANAGEDOBJECT" && (ARGS["CLASS"] == "RMOD_R") {
        C=0;
        L[C++]=ARGS["DISTNAME"];
}
C && (TAG=="P") && (TAGS ~ /%LIST%/) {  L[C++]=$2       }
C && (CTAG=="MANAGEDOBJECT") {
        S=""
        for(N=1; N < C; N++) S=S ";" L[N]

        print "CLI_RMOD_ACL^Mo,activeCellsList,antennaPathDelayMeasurementCapable";
        print "CLI_RMOD_ACL^" L[0] ",activeCellsList,"substr(S,2)",True";
        C=0;
}

And use it thus:

Code:

$ awk -f yanx.awk -f managed.awk input.xml

CLI_RMOD_ACL^Mo,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15;201,True

$

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

Shell Programming and Scripting

XML Phase with awk

9 More Discussions You Might Find Interesting

1. Docker

Docker learning Phase-I

Discussion started by: RavinderSingh13

2. What is on Your Mind?

Update to Advanced Search Page (Phase 1)

Discussion started by: Neo

3. What is on Your Mind?

UserCP and Other Private User Area Revisions (Phase 0)

Discussion started by: Neo

4. What is on Your Mind?

Phase III: CSS Flexbox Upgrades

Discussion started by: Neo

5. War Stories

Postbit Changes (Phase II Upgrade)

Discussion started by: Neo

6. What is on Your Mind?

Prototyping New Responsive Mobile for UNIX.COM - Phase II

Discussion started by: Neo

7. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Discussion started by: Nevergivup

8. Solaris

init phase

Discussion started by: saudsos

9. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Discussion started by: Sebi0815