Split large xml into mutiple files and with header and footer in file Post: 303030849

Sponsored Content

Top Forums Shell Programming and Scripting Split large xml into mutiple files and with header and footer in file Post 303030849 by karthik on Monday 18th of February 2019 12:30:22 AM

02-18-2019

Registered User

Hi Don,

My Apologies for confusing you again AWK commands are perfectly working fine and it splits file correctly as expected

Hope I am not confusing you further

1) If my input file name is sampletest_111.xml after AWK command file name will be like sampletest_111.xml.0001
2)sampletest_111.xml.0001 is renamed to Extrfile111.xml
3)when there are multiple input files AWK is spliting files and creating unique files but
below piece of code is not renaming files in a sequence its just appending to 1 file
Output Expected:Extrfile111.xml,Extrfile1112.xml etc i mean unique name

Code:

for f in ../Inbound/sampletest_*
  do    TMP="${f/sampletest_/Extrfile}"
         mv "$f" "${TMP%.*}"
  done

Total code :

Code:

#!/bin/sh

#pass all Input files to array
FileList=($(ls | grep "sampletest*\\_[0-9]"))
  
echo  "$FileList"  

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 

#File Split Begin
awk -f xml_tag_handler.awk -f File_split.awk OUT=$x"" ROWS="500" $x $x
mv $x ../
done

rm Response.xml Extr*.xml

for f in sampletest_*
echo "$f"
  do    TMP="${f/sampletest_/Extrfile}"
         mv "$f" "${TMP%.*}"
  done

# add all files to array
arr=($(ls | grep "Extrfile[0-9]*.xml"))


 #loop array
for i in "${arr[@]}"
do
 #for each element in array
  echo "$i"

   sed -i '/<com1:URI>/c\<com1:URI>file:///tmp/karthik/'$i'</com1:URI>' soaprequest.xml
  
#WebService Call Begin
sleep 5
curl --header "Content-Type: text/xml;charset=UTF-8" --data @soaprequest.xml {WSDLURL} --insecure >> Response.xml
echo ":Webservice call Begin"
done

  sed -i '/<com1:URI>/c\<com1:URI>file:///tmp/karthik/'$i'</com1:URI>' soaprequest.xml
  
echo ":Webservice call End"

NEW_VAR=$(awk -v sq="'" -F'<ns11:Job_Id>' '
		{	for(i = 2; i <= NF; i++) {
				sub(/<.*/, "", $i)
				printf("%s%s", cnt++ ? "," : sq, $i)
			}
		}
		END {	print sq
		}' Response.xml	
	)

printf 'NEW_VAR has been assigned the value: %s\n' "$NEW_VAR"

#End Web Service Call

xml_tag_handler.awk:

Code:

###############################################################################
BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

# After match("qwertyuiop", /rty/)
#       rbefore("qwertyuiop") is "qwe",
#       rmid("qwertyuipo")    is "r"
#       rall("qwertyuiop")    is "rty"
#       rafter("qwertyuiop")  is "uiop"

# !?!?!
# function rbefore(STR)   { return(substr(STR, N, RSTART-1)); }# before match
function rbefore(STR)     { return(substr(STR, 0, RSTART-1)); }# before match
function rmid(STR)      { return(substr(STR, RSTART, 1)); }  # First char match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after match

function aquote(OUT, A, PFIX, TA) { # Turns Q SUBSEP R into A[PFIX":"Q]=R
        if(OUT)
        {
                if(PFIX) PFIX=PFIX":"
                split(OUT, TA, SUBSEP);
                A[toupper(PFIX) toupper(TA[1])]=TA[2];
        }

        return("");
}

# Intended to be less stupid about quoted text in XML/HTML.
# Splits a='b' c='d' e='f' into A[PFIX":"a]=b, A[PFIX":"c]=d, etc.
function qsplit(STR, A, PFIX, X, OUT) {
        while(STR && match(STR, /([ \n\t]+)|[\x27\x22=]/))
        {
                OUT = OUT rbefore(STR);
                RMID=rmid(STR);

                if((RMID == "'") || (RMID == "\""))     # Quote characters
                {
                        if(!Q)          Q=RMID;         # Begin quote section
                        else if(Q == RMID)      Q="";   # End quote section
                        else                    OUT = OUT RMID; # Quoted quote
                } else if(RMID == "=") {
                        if(Q)   OUT=OUT RMID; else OUT=OUT SUBSEP;
                } else if((RMID=="\r")||(RMID=="\n")||(RMID=="\t")||(RMID==" ")) {
                        if(Q)   OUT = OUT rall(STR); # Literal quoted whitespace
                        else    OUT = aquote(OUT, A, PFIX); # Unquoted WS, next block
                }
                STR=rafter(STR); # Strip off the text we've processed already.
        }

        aquote(OUT STR, A, PFIX); # Process any text we haven't already.
}


{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next } # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
(!SPEC) && match($1, /^[^\/ \r\n\t>]+/) {
        CTAG=""
        TAG=substr(toupper($1), RSTART, RLENGTH);
        if((!SPEC) && !($1 ~ /\/$/))
        {
                TAGS=TAG "%" TAGS;
                DEP++;
                LTAGS=TAGS
        }

        for(X in ARGS) delete ARGS[X];

        qsplit(rafter($1), ARGS, "", "", "");
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
        CTAG=toupper($1)
        TAG=""
#        sub("^.*" toupper($1) "%", "", TAGS);
        sub("^" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        # Update TAG with tag on top of stack, if any
#       if(DEP < 0) {   DEP=0;  TAG=""  }
#       else { TAG=TA[DEP]; }
}

File_split.awk

Code:

BEGIN {
        ORS=""
        #OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(!length(OUT)) FBASE=FILENAME "."
                else FBASE = OUT "."
				
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", FBASE,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

#8 in the same thread got the sample xml structure for your reference

Last edited by karthik; 02-18-2019 at 01:40 AM..

karthik

View Public Profile for karthik

Find all posts by karthik

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Chop Header and Footer record from input file

Hi, I need to chope the header and footer record from an input file and make a new output file, please let me know how i can do it in unix.thanks.

2. Shell Programming and Scripting

Total of lines w/out header and footer incude for a file

I am trying to get a total number of tapes w/out headers or footers in a ERV file and append it to the file. For some reason I cannot get it to work. Any ideas? #!/bin/sh dat=`date +"%b%d_%Y"` + date +%b%d_%Y dat=Nov16_2006 tapemgr="/export/home/legato/tapemgr/rpts"...

3. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?

4. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?

5. Shell Programming and Scripting

sort a report file having header and footer

I am having report file with header and footer . The details in between header and footer are separated by a pipe charater. I want to sort the file by considering multiple columns in between header and footer. pls help

6. Shell Programming and Scripting

Add header and footer with record count in footer

This is my file(Target.txt) name|age|locaction abc|23|del xyz|24|mum jkl|25|kol The file should be like this 1|03252012 1|name|age|location 2|abc|23|del 2|xyz|24|mum 2|jkl|25|kol 2|kkk|26|hyd 3|4 Column 1 is row indicator for row 1 and 2, column indicator is 1,for data rows...

7. Shell Programming and Scripting

Removing header or footer from file

Hi Every one, what is the coomand to remove header or footer from a file. Please help me by providing command/syntax to remove header/footer from unix. Thanks in advance for all your support.

8. Shell Programming and Scripting

Is there a way to append both at header and footer of a file

currently I've a file Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('024', '024', 2, sysdate); Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('032', '032', 2, sysdate); ........ is it...

9. UNIX for Dummies Questions & Answers

File Row Line Count without Header Footer

Hi There! I am saving the file count of all files in a directory to an output file using: wc -l * > FileCount.txt I get: 114 G4SXORD 3 G4SXORH 0 G4SXORP 117 total But this count includes header and footer. I want to subtract 2 from the count and get ...

10. UNIX for Dummies Questions & Answers

Eliminate Header and footer from EBCDIC file

Is there any command to eliminate Header and footer from EBCDIC file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Chop Header and Footer record from input file

Discussion started by: coolbudy

2. Shell Programming and Scripting

Total of lines w/out header and footer incude for a file

Discussion started by: gzs553

3. Shell Programming and Scripting

Split large file and add header and footer to each file

Discussion started by: ashish4422

4. Shell Programming and Scripting

Split large file and add header and footer to each small files

Discussion started by: ashish4422

5. Shell Programming and Scripting

sort a report file having header and footer

Discussion started by: suryanarayana

6. Shell Programming and Scripting

Add header and footer with record count in footer

Discussion started by: itsranjan

7. Shell Programming and Scripting

Removing header or footer from file

Discussion started by: sridhardwh

8. Shell Programming and Scripting

Is there a way to append both at header and footer of a file

Discussion started by: jediwannabe

9. UNIX for Dummies Questions & Answers

File Row Line Count without Header Footer

Discussion started by: gagan8877

10. UNIX for Dummies Questions & Answers

Eliminate Header and footer from EBCDIC file

Discussion started by: abhilashnair