Sponsored Content
Full Discussion: Parse XML For Values
Top Forums Shell Programming and Scripting Parse XML For Values Post 302922294 by Corona688 on Thursday 23rd of October 2014 03:09:39 PM
Old 10-23-2014
The best I can do without more information:

Code:
$ cat allinput.awk

BEGIN {
        FS=">"; OFS="\t"
        RS="<";

        # INPUTA, as in tag "input" attribute "a".  They must be allcaps here.
        split("INPUTA INPUTB A B C D E F G H I J K L M", ORDER, " ");
}

# These should be special variables for match() but aren't.
function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
match($1, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS);
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS

#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        if(DEP < 0) DEP=0;
}

### Example of how to use it ###
# TAG is the name of the last open-tag
# TAGS is an array of tag names like INNER%MIDDLE%OUTERMOST
# $2 is CDATA inside the current tag
# ARGS is an array of arguments for the current tag
#
# So, when processing <a> in  <html><a href="index.html">Yay!</a></html>
# it would have:
# TAG="A"
# ARGS["HREF"]="index.html"
# TAGS="A%HTML"
# $2="Yay!"

# Handle <input> tag
(TAGS ~ /^INPUT%/) {    for(X in ARGS)  DATA[TAG X]=ARGS[X]     }

# Parse <tags> inside <input> so DATA[TAGNAME]=CONTENTS
(TAGS ~ /(^|%)INPUT%/) && ($2 ~ /[^ \r\n\t]/) && !/^\// {
        # Clean up tag contents
        sub(/^[ \r\n]+/, "", $2);
        sub(/[ \r\n]+$/, "", $2);
        DATA[TAG]=$2
}

# Handle </input>, printing and clearing collected data
toupper($1) == "/INPUT" {
        PFIX=""
        for(M=1; M in ORDER; M++)
        {
                # Convert blank fields into single spaces, since the shell will see
                # two tabs in a row as one field, skipping the blank one.
                if(DATA[ORDER[M]]=="") DATA[ORDER[M]]=" "
                printf("%s%s", PFIX, DATA[ORDER[M]]);
                PFIX=OFS;
        }

        printf("\n");

        for(X in DATA) delete DATA[X];
}

$ awk -f allinput.awk allinput.xml

2389906 install                 111     222     333             444                     C,D,E,G C,D,E,G 555
4732435 delete                  999     792                     990     942    992              C,D,G,H,I       C,D,G,H,I       804

$ awk -f allinput.awk allinput.xml |
while IFS=$'\t' read INPUTA INPUTB A B C D E F G H I J K L M
do
        # Convert all single-space fields into completely blank fields
        for X in INPUTA INPUTB A B C D E F G H I J K L M
        do
                [ "${!X}" = " " ] && read $X # Cheeky trick to set arbitrary variable contents
        done < /dev/null
        echo "doing something with $INPUTA $INPUTB $L $M"
done

doing something with 2389906 install C,D,E,G 555
doing something with 4732435 delete C,D,G,H,I 804

$

The best I can do without better information. It won't work if your data contains tabs anywhere. I've highlighted in red anywhere tag/attribute names are hardcoded.

Last edited by Corona688; 10-23-2014 at 04:20 PM..
 

10 More Discussions You Might Find Interesting

1. Programming

parse xml

Hi, I'm looking for an "easy" way to parse a xml file to a proper structure. The xml looks like this What shall I use? Does anybody has some example-code to share or some good links/book-references? thx for any reply -fe (5 Replies)
Discussion started by: bin-doph
5 Replies

2. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies

3. Shell Programming and Scripting

How can I parse xml file?

How can I parse file containing xml ? I am sure that its best to use perl - but my perl is not very good - can someone help? Example below contents of file containing the xml - I basically want to parse the file and have each field contained in a variable.. ie. I want to store the account... (14 Replies)
Discussion started by: frustrated1
14 Replies

4. Shell Programming and Scripting

Parse an XML task list to create each task.xml file

I have an task definition listing xml file that contains a list of tasks such as <TASKLIST <TASK definition="Completion date" id="Taskname1" Some other <CODE name="Code12" <Parameter pname="Dog" input="5.6" units="feet" etc /Parameter> <Parameter... (3 Replies)
Discussion started by: MissI
3 Replies

5. Shell Programming and Scripting

Parse XML

Hi all! I'm looking to write a quick script and in it I need to request an XML file from a service running on localhost and parse that XML file and output it. I'm looking to do it in bash although it doesn't really matter what shell it is in. The XML file returned would look like this: ... (3 Replies)
Discussion started by: mtehonica
3 Replies

6. Shell Programming and Scripting

Parse XML line

Hi I am having an xml file with lines like these <d name="T2tt_350_100" title="T2tt_012j_350_100_428p4_pPF_PU" add="1" color="4" ls="1" lw="2" normf="1" xsection="0.070152" EqLumi="94651.6"... (2 Replies)
Discussion started by: Alkass
2 Replies

7. Programming

Parse XML file

How do I get the field info for tags ID, NAME, DESCRIPTION. Below is my current code put I can't get beyond the first_child of the file. use strict; use warnings; use XML::Simplehttp://images.intellitxt.com/ast/adTypes/icon1.png; use... (1 Reply)
Discussion started by: leemalloy
1 Replies

8. UNIX for Dummies Questions & Answers

Parse xml file

HI Guys, Input .XML <xn:MeContext id="L0307"> <xn:ManagedElement id="1"> <xn:VsDataContainer id="1"> <xn:attributes> <xn:vsDataType>vsDataENodeBFunction</xn:vsDataType> ... (3 Replies)
Discussion started by: pareshkp
3 Replies

9. Shell Programming and Scripting

Parse XML using xmllint

Hi All, Need help to parse the xml file in shell script using xmllint. Below is the sample xml file. <CARS> <AUDI> <Speed="45"/> <speed="55"/> <speed="75"/> <speed="95"/> </AUDI> <BMW> <Speed="30"/> <speed="75"/> <speed="120"/> <speed="135"/> </BMW>... (6 Replies)
Discussion started by: prasanna2166
6 Replies

10. Shell Programming and Scripting

Parse xml file

I am trying to create a shell script that will parse an xml file (file attached). awk '/Id v=/ { print }' Test.xml | sed 's!<Id v=\"\(.*\)\"/>!\1!' > output.txt An output.txt file is created but it is empty. It should contain the value 222159 in it. Thanks. (7 Replies)
Discussion started by: cmccabe
7 Replies
All times are GMT -4. The time now is 09:01 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy