XML Phase with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting XML Phase with awk
# 1  
Old 12-10-2018
XML Phase with awk

Hi Guys,

Input XML File :-

Code:
  <managedObject class="RMOD_R" distName="MRBTS-101/X/R-7">
   <list name="activeCellsList">
    <p>15</p>
    <p>201</p>
   </list>
   <p name="aldManagementProtocol">True</p>
   <p name="serialNumber">845</p>
  </managedObject>

Output :-

Code:
CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15;201,True

Have tired below but not getting two value from same tag <p>

Code:
awk -F'[\\""\\>\\<]' -v OFS=',' 'BEGIN{print "Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable"} /RMOD_R/{a=$5}/activeCellsList/{b=$3}/<p>/{c=$3}/antennaPathDelayMeasurementCapable/{print a,b,c,$5}' | sed 's/^/CLI_RMOD_ACL^/'

What I am getting from the above command

Code:
CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True


Last edited by vgersh99; 12-10-2018 at 01:44 PM.. Reason: fixed icode -> code tags
# 2  
Old 12-10-2018
Have tried with xmllint command easier to handle XML files.



You need an header line
Code:
CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable

and then
Code:
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True

which is
  • CLI_RMOD_ACL (Static)
  • MRBTS-101/X/R-7 (attribute )
  • activeCellsList (static)
  • All values of list starting with tag p
  • True ( static)




So for part 2 using xpath in xmllint

Code:
echo 'cat //@distName' | xmllint --shell try.txt | grep -v '>' | awk -F=\" '{ print $2 }'

Does this work ?
# 3  
Old 12-10-2018
No Really !! -bash: xmllint: command not found
# 4  
Old 12-10-2018
Quote:
Originally Posted by chakrapani
Have tried with xmllint command easier to handle XML files.



You need an header line
Code:
CLI_RMOD_ACL^Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable

and then
Code:
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15,True

which is
  • CLI_RMOD_ACL (Static)
  • MRBTS-101/X/R-7 (attribute )
  • activeCellsList (static)
  • All values of list starting with tag p
  • True ( static)




So for part 2 using xpath in xmllint

Code:
echo 'cat //@distName' | xmllint --shell try.txt | grep -v '>' | awk -F=\" '{ print $2 }'

Does this work ?
YEAH I AGREE
# 5  
Old 12-10-2018
I'm not getting anything (but the header) with your awk code, but try this:
Code:
 awk -F'["<>]' -v OFS=',' '
   BEGIN{print "Mo,activeCellsList,activeCellsList,antennaPathDelayMeasurementCapable"} 
   /RMOD_R/{a=$(NF-2)}
   $2~/list name/{b=$3}
   $2=="p"{c=(c)?c";"$3:$3}
   /aldManagementProtocol/{print a,b,c,$5;a=b=c=""}' myXMLfile | sed 's/^/CLI_RMOD_ACL^/'


Last edited by vgersh99; 12-10-2018 at 04:09 PM.. Reason: init abc
# 6  
Old 12-11-2018
Handling XML even slightly properly isn't trivial. But we get asked for it a lot, so:

Code:
# yanx.awk v0.0.8, Tyler Montbriand, 2017.  Yet another noncompliant XML parser
###############################################################################
# XML is a pain to process in the shell, but people need it all the time.
# I've been using and improving this kludge since 2014 or so.  It parses and
# stacks tags and digests parameters, allowing simple XML processing and
# extraction to be managed with a handful of lines addendum.
#
# I've restricted my use of GNU features enough that this script will run on
# busybox's awk.  I think it works with mawk except -e is unsupported.
# You can work around that by running multiple files, i.e.
# mawk -f yanx.awk -f mystuff.awk inputfile
###############################################################################
# Basic use:
#
# Fed this XML, <body><html a="b">Your Web Browser Hates This</html></body>
# yanx will read it token-by-token as so:
#     Line 1:  Empty, skipped
#     Line 2:  $1="body"
#     Line 3:  $1="html a="b"", $2="Your web browser hates this"
#     Line 4:  $1="/html"
#     Line 5:  $1="/body", $2="\n"
#
# The script sets a few new "special" variables along the way.
# TAG           The name of the current tag, uppercased.
# CTAG          If close-tag, name in uppercase.
# TAGS          List of nested tags, like HTML%BODY%, including current tag
# LTAGS         List of nested tags, not including current tag
# ARGS          Array of tag parameters, uppercased.  i.e. ARGS["HREF"]
# DEP           How many tags deep it's nested, including current tag.
#
###############################################################################
# Examples:
# # Rewrite cdata of all divs
# awk -f yanx.awk -e 'TAGS ~ /^DIV%/ { $2="quux froob" } 1' input
# # Extract href's from every link
# awk -f yanx.awk -e 'TAGS~/^A%/ && ("HREF" in ARGS) {
#       print ARGS["HREF"] }' ORS="\n" input
###############################################################################
# Known Bugs:
# A short XML script can't possibly handle DOD, etc.  Entities a la &lt;
# are not translated either.
#
# I've done my best to make it swallow <!--, <? ?> and other such fancy
# XML syntax without choking, but that doesn't mean it handles them
# properly either.
#
# It's an XML parser, not an HTML parser.  It probably won't swallow a
# wild-from-the internet HTML web page without some cleanup first:
# javascript, tags inside comments, etc will be mangled instead of ignored.
#
# Last: Because of its design, when printing raw HTML, yanx adds an extra <
# to the end of the file.  This is because < belongs at the beginning of
# a token but awk is told it's printed at the end.  There is no equivalent
# "line prefix" variable that I know of, if you want it to print smarter
# you'll have to print the <'s yourself, by setting ORS=" and
# printing lines like print "<" $0
###############################################################################
BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

# After match("qwertyuiop", /rty/)
#       rbefore("qwertyuiop") is "qwe",
#       rmid("qwertyuipo")    is "r"
#       rall("qwertyuiop")    is "rty"
#       rafter("qwertyuiop")  is "uiop"

# !?!?!
# function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rbefore(STR)   { return(substr(STR, 0, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
(!SPEC) && match($1, /^[^\/ \r\n\t>]+/) {
        CTAG=""
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS, "", "", "");
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
        CTAG=toupper($1)
        TAG=""
#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        # Update TAG with tag on top of stack, if any
#       if(DEP < 0) {   DEP=0;  TAG=""  }
#       else { TAG=TA[DEP]; }
}

Using this, we can build a solution out of awk:

Code:
# managed.awk
BEGIN { ORS="\n" }
TAG=="MANAGEDOBJECT" && (ARGS["CLASS"] == "RMOD_R") {
        C=0;
        L[C++]=ARGS["DISTNAME"];
}
C && (TAG=="P") && (TAGS ~ /%LIST%/) {  L[C++]=$2       }
C && (CTAG=="MANAGEDOBJECT") {
        S=""
        for(N=1; N < C; N++) S=S ";" L[N]

        print "CLI_RMOD_ACL^Mo,activeCellsList,antennaPathDelayMeasurementCapable";
        print "CLI_RMOD_ACL^" L[0] ",activeCellsList,"substr(S,2)",True";
        C=0;
}

And use it thus:

Code:
$ awk -f yanx.awk -f managed.awk input.xml

CLI_RMOD_ACL^Mo,activeCellsList,antennaPathDelayMeasurementCapable
CLI_RMOD_ACL^MRBTS-101/X/R-7,activeCellsList,15;201,True

$

This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Docker

Docker learning Phase-I

Hello All, I had recently learnt a bit of Docker(which provides containerization process). Here are some of my learning points from it. Let us start first with very basic question: What is Docker: Docker is a platform for sysadmins and developers to DEPLOY, DEVELOP and RUN applications ... (7 Replies)
Discussion started by: RavinderSingh13
7 Replies

2. What is on Your Mind?

Update to Advanced Search Page (Phase 1)

Update: I have completed the first phase of revamping the "Advanced Search" page using Bootstrap (desktop not mobile yet): https://www.unix.com/search.php https://www.unix.com/search.php I may change this to a Bootstrap modal later and change the CSS a bit more; but for now it is much... (0 Replies)
Discussion started by: Neo
0 Replies

3. What is on Your Mind?

UserCP and Other Private User Area Revisions (Phase 0)

Hey, Having made a lot of progress on the "public facing" part of UNIX.COM, I may start working on remodeling the UserCP and other private areas. Today, I made some small changes; but nothing major in the UserCP area. However, I have not yet decided what to do with the UserCP: Keep... (0 Replies)
Discussion started by: Neo
0 Replies

4. What is on Your Mind?

Phase III: CSS Flexbox Upgrades

On my never ending quest to get rid of table tags and make the site responsive for all device sizes, I plan to use CSS Flexbox with jQuery. I tried using Bootstrap Flex and CSS Grids, but none of these worked as easy and clean as CSS Flexbox. For example, today I changed the top header area... (0 Replies)
Discussion started by: Neo
0 Replies

5. War Stories

Postbit Changes (Phase II Upgrade)

Next in the pipeline, thinking I will work on postbit (the core of the posts) and try to get Bootstrap and badges working in postbit and not break the quick editors in the post. Note, I had to turn off the scrollbars in postbit for now because when I turn them on, it breaks the quick editor in... (11 Replies)
Discussion started by: Neo
11 Replies

6. What is on Your Mind?

Prototyping New Responsive Mobile for UNIX.COM - Phase II

Have completed "Phase I" of our project "Prototyping New Responsive Mobile UNIX.COM", I am now moving to "Phase II" which will be changing many of the menus and buttons to use Javascript and CSS for the mobile site menus. For example, here is the new "main side menu" for the mobile site (below).... (63 Replies)
Discussion started by: Neo
63 Replies

7. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Hi , I have a below xml: <ns:Body> <ns:result> <Date Month="June" Day="Monday:/> </ns:result> </ns:Body> i have a lookup abc.txtt text file with below details Month June July August Day Monday Tuesday Wednesday I need a output xml with below tags <ns:Body> <ns:result>... (2 Replies)
Discussion started by: Nevergivup
2 Replies

8. Solaris

init phase

Hello, Can somebody explain me the relationship between /sbin and /etc directories ? what is the relationship between them and what are the roles of files such as rcd.1 etc? (1 Reply)
Discussion started by: saudsos
1 Replies

9. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Hello, I trying to extract text that is surrounded by xml-tags. I tried this cat tst.xml | egrep "<SERVER>.*</SERVER>" |sed -e "s/<SERVER>\(.*\)<\/SERVER>/\1/"|tr "|" " " which works perfect, if the start-tag and the end-tag are in the same line, e.g.: <tag1>Hello Linux-Users</tag1> ... (5 Replies)
Discussion started by: Sebi0815
5 Replies
Login or Register to Ask a Question