Shell Programming and Scripting

View Public Profile for rahulmittal87

10-23-2014

Registered User

8, 0

Join Date: Jan 2010

Last Activity: 10 July 2015, 1:48 PM EDT

Posts: 8

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks! I will try it.....

---------- Post updated at 12:24 PM ---------- Previous update was at 12:00 PM ----------

Looks very good!... Thanks a lot.

Just two things:
1. I also need value for A & B.
2. How can I execute some shell commands after each group, which has values A to M.

rahulmittal87

Find all posts by rahulmittal87

10-23-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

1) Easy enough, but what do you want to do with them?
2) Good, now we're going somewhere.

Getting the data out of awk, into the shell, is the question now. Imagine you made a loop in the shell.

Code:

while [reading xml file]
do
       # What variables do you need here, set to what, for each tag?
done

Tell me exactly how you need to use this data and I can help create a loop for you.

A little more detail on the nature of your data would be good as well. If it's not as pretty as your example -- tags and data full of newlines, etc -- that might need some mangling to fix.

Corona688

View Public Profile for rahulmittal87

10-23-2014

Registered User

8, 0

Join Date: Jan 2010

Last Activity: 10 July 2015, 1:48 PM EDT

Posts: 8

Thanks Given: 1

Thanked 0 Times in 0 Posts

I want to perform database queries based on Values of A to M. I need to decide the type of query whether insert,update or delete based on the value of B. And, will update the value of database table attributes using values C to M.

I am sorry but I cannot expose the data fields.

rahulmittal87

Find all posts by rahulmittal87

10-23-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

It tells me nothing about your customer credit card list or whatever to tell me that your XML might be messy and full of extra newlines which should be tossed before your script sees the data. You could at least have answered that.

I don't need the actual data. I do need to know what you want to do with it. You want to run shell commands on "something" -- well, what shell commands would you be running, based on your mockup data? Assume each tag is a single column, you can do the splitting yourself.

Is there any safe separator I can use, anything that's not found in A through M? Does it ever contain quotes or tabs?

Last edited by Corona688; 10-23-2014 at 03:56 PM..

Corona688

10-23-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

The best I can do without more information:

Code:

$ cat allinput.awk

BEGIN {
        FS=">"; OFS="\t"
        RS="<";

        # INPUTA, as in tag "input" attribute "a".  They must be allcaps here.
        split("INPUTA INPUTB A B C D E F G H I J K L M", ORDER, " ");
}

# These should be special variables for match() but aren't.
function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
match($1, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS);
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS

#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        if(DEP < 0) DEP=0;
}

### Example of how to use it ###
# TAG is the name of the last open-tag
# TAGS is an array of tag names like INNER%MIDDLE%OUTERMOST
# $2 is CDATA inside the current tag
# ARGS is an array of arguments for the current tag
#
# So, when processing <a> in  <html><a href="index.html">Yay!</a></html>
# it would have:
# TAG="A"
# ARGS["HREF"]="index.html"
# TAGS="A%HTML"
# $2="Yay!"

# Handle <input> tag
(TAGS ~ /^INPUT%/) {    for(X in ARGS)  DATA[TAG X]=ARGS[X]     }

# Parse <tags> inside <input> so DATA[TAGNAME]=CONTENTS
(TAGS ~ /(^|%)INPUT%/) && ($2 ~ /[^ \r\n\t]/) && !/^\// {
        # Clean up tag contents
        sub(/^[ \r\n]+/, "", $2);
        sub(/[ \r\n]+$/, "", $2);
        DATA[TAG]=$2
}

# Handle </input>, printing and clearing collected data
toupper($1) == "/INPUT" {
        PFIX=""
        for(M=1; M in ORDER; M++)
        {
                # Convert blank fields into single spaces, since the shell will see
                # two tabs in a row as one field, skipping the blank one.
                if(DATA[ORDER[M]]=="") DATA[ORDER[M]]=" "
                printf("%s%s", PFIX, DATA[ORDER[M]]);
                PFIX=OFS;
        }

        printf("\n");

        for(X in DATA) delete DATA[X];
}

$ awk -f allinput.awk allinput.xml

2389906 install                 111     222     333             444                     C,D,E,G C,D,E,G 555
4732435 delete                  999     792                     990     942    992              C,D,G,H,I       C,D,G,H,I       804

$ awk -f allinput.awk allinput.xml |
while IFS=$'\t' read INPUTA INPUTB A B C D E F G H I J K L M
do
        # Convert all single-space fields into completely blank fields
        for X in INPUTA INPUTB A B C D E F G H I J K L M
        do
                [ "${!X}" = " " ] && read $X # Cheeky trick to set arbitrary variable contents
        done < /dev/null
        echo "doing something with $INPUTA $INPUTB $L $M"
done

doing something with 2389906 install C,D,E,G 555
doing something with 4732435 delete C,D,G,H,I 804

$

The best I can do without better information. It won't work if your data contains tabs anywhere. I've highlighted in red anywhere tag/attribute names are hardcoded.

Last edited by Corona688; 10-23-2014 at 04:20 PM..

Corona688