Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Split large xml into mutiple files and with header and footer in file


Login or Register to Reply

 
Thread Tools Search this Thread
# 8  
Thanks Rudic for the input after Your suggestion Corona updated the code and it worked and i need small change to it my footer is different i will update it with the xml input and output how it should look like.

--- Post updated at 11:08 PM ---

Hi Corona,

Thank you so much it worked with your updated code I am able to split the large file into mutiple chunks and i need small change in the output as my footer is different now.Kindly assist on the below

First 2 lines is considered as header:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

Last line which is a EOF is the footer
---Footer
Code:
 </DocumentSet>

Input :

Header

Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

---Body 
    <Recipient>
        <Context>
            <TESTER>08</TESTER>
            <name>TEST</name>
            <Locale>en_AU</Locale>
            <Channel>kjsdhfuis</Channel>
            <UserId>8</UserId>
            <HLX>000000</HLX>
            <Key1>TEST1</Key1>
            <Key2>TEST2</Key2>
            <Key3>TEST3</Key3>
            <KeyID>hotdirectorytest</KeyID>
            <dummy2222>TEST7</dummy2222>
            <EffectiveFrom>20170612000000</EffectiveFrom>
            <Currency>AUD</Currency>
        </Context>
        <Document>
            <Form>
                <Name>TESTER2</Name>
                <Data>
                    <DocumentSetC>
                        <HeaderData>
                            <TESTER>08</TESTER>
                            <Channel>kjsdhfuis</Channel>
                            <UserId>X009189</UserId>
                            <HLX>000000</HLX>
                            <dummy>08VIC000000</dummy>
                            <Key1>TEST2</Key1>
                            <Key2>TEST3</Key2>
                            <Key3/>
                            <KeyID>TEST70</KeyID>
                            <dummy2222>Approval Letter</dummy2222>
                            <TEST7>APPA08120617206891</TEST7>
                            <EffectiveFrom>20170612000000</EffectiveFrom>
                            <HLX44>12345</HLX44>
                            <SystemDate>20170612</SystemDate>
                        </HeaderData>
                        <FormData>
                            <Name>TESTER2</Name>
                            <Context>
                                <UniqueDocID>1240525</UniqueDocID>
                                <dummy11112233>LEN_APP_0010_OUT</dummy11112233>
                                <TEST2ApprovedAmount>8989</TEST2ApprovedAmount>
                            </Context>
                            <ReceivingParty>
                                <Applicant>
                                    <TEST45456>sfdsfnsdfnff  </TEST45456>
                                </Applicant>
                                <IndividualDemographics>
                                
                                </IndividualDemographics>
                                <DeliveryChannel>POST</DeliveryChannel>
                                <NoOfCopies>1</NoOfCopies>
                            </ReceivingParty>
                            <Application>
                                <ProductGroups>
                            <TEST454567>sfdsfnsdfnff  </TEST454567>

                                </ProductGroups>
                            </Application>
                        </FormData>
                    </DocumentSetC>
                </Data>
            </Form>
            <TYP1>5</TYP1>
        </Document>
    </Recipient>
       <Recipient2> ---</Recipient2>
           ---------------
           -------------- 
           -----------------
            -----------------
          <Recipient18000> ---</Recipient18000>
    
---Footer
 </DocumentSet>



Output:
Below is the output I am expecting its 1 file example so every file should have those header and footer
File1:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
<Recipient1>  </Recipient1>
<Recipient2>  </Recipient2>
<Recipient3>  </Recipient3>
-------------------
-------------------
-------------------
<Recipient100>  </Recipient100>
</DocumentSet>


Last edited by karthik; 12-17-2018 at 08:47 PM..
# 9  
That is not a small change. I will have to completely rewrite it.

Do you truly want all the data stripped out of your recipient tags? Really? Show representative output.

Last edited by Corona688; 12-18-2018 at 11:50 AM..
# 10  
xmlsplit2.awk
Code:
BEGIN {
        ORS=""
        OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", OUT,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

input3
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    <Recipient11><Context></Context><Document></Document></Recipient11>
    <Recipient12><Context></Context><Document></Document></Recipient12>
    <Recipient13><Context></Context><Document></Document></Recipient13>
    <Recipient14><Context></Context><Document></Document></Recipient14>
    <Recipient15><Context></Context><Document></Document></Recipient15>
    <Recipient16><Context></Context><Document></Document></Recipient16>
    <Recipient17><Context></Context><Document></Document></Recipient17>
    <Recipient18><Context></Context><Document></Document></Recipient18>
    <Recipient19><Context></Context><Document></Document></Recipient19>
    <Recipient20><Context></Context><Document></Document></Recipient20>
    <Recipient21><Context></Context><Document></Document></Recipient21>
    <Recipient22><Context></Context><Document></Document></Recipient22>
    <Recipient23><Context></Context><Document></Document></Recipient23>
    <Recipient24><Context></Context><Document></Document></Recipient24>
    <Recipient25><Context></Context><Document></Document></Recipient25>
    <Recipient26><Context></Context><Document></Document></Recipient26>
    <Recipient27><Context></Context><Document></Document></Recipient27>
    <Recipient28><Context></Context><Document></Document></Recipient28>
    <Recipient29><Context></Context><Document></Document></Recipient29>
    <Recipient30><Context></Context><Document></Document></Recipient30>
    <Recipient31><Context></Context><Document></Document></Recipient31>
    <Recipient32><Context></Context><Document></Document></Recipient32>
    <Recipient33><Context></Context><Document></Document></Recipient33>
    <Recipient34><Context></Context><Document></Document></Recipient34>
    <Recipient35><Context></Context><Document></Document></Recipient35>
    <Recipient36><Context></Context><Document></Document></Recipient36>
    <Recipient37><Context></Context><Document></Document></Recipient37>
    <Recipient38><Context></Context><Document></Document></Recipient38>
    <Recipient39><Context></Context><Document></Document></Recipient39>
    <Recipient40><Context></Context><Document></Document></Recipient40>
    <Recipient41><Context></Context><Document></Document></Recipient41>
    <Recipient42><Context></Context><Document></Document></Recipient42>
    <Recipient43><Context></Context><Document></Document></Recipient43>
    <Recipient44><Context></Context><Document></Document></Recipient44>
    <Recipient45><Context></Context><Document></Document></Recipient45>
    <Recipient46><Context></Context><Document></Document></Recipient46>
    <Recipient47><Context></Context><Document></Document></Recipient47>
    <Recipient48><Context></Context><Document></Document></Recipient48>
    <Recipient49><Context></Context><Document></Document></Recipient49>
    <Recipient50><Context></Context><Document></Document></Recipient50>
    <Recipient51><Context></Context><Document></Document></Recipient51>
    <Recipient52><Context></Context><Document></Document></Recipient52>
    <Recipient53><Context></Context><Document></Document></Recipient53>
    <Recipient54><Context></Context><Document></Document></Recipient54>
    <Recipient55><Context></Context><Document></Document></Recipient55>
    <Recipient56><Context></Context><Document></Document></Recipient56>
    <Recipient57><Context></Context><Document></Document></Recipient57>
    <Recipient58><Context></Context><Document></Document></Recipient58>
    <Recipient59><Context></Context><Document></Document></Recipient59>
    <Recipient60><Context></Context><Document></Document></Recipient60>
    <Recipient61><Context></Context><Document></Document></Recipient61>
    <Recipient62><Context></Context><Document></Document></Recipient62>
    <Recipient63><Context></Context><Document></Document></Recipient63>
    <Recipient64><Context></Context><Document></Document></Recipient64>
    <Recipient65><Context></Context><Document></Document></Recipient65>
    <Recipient66><Context></Context><Document></Document></Recipient66>
    <Recipient67><Context></Context><Document></Document></Recipient67>
    <Recipient68><Context></Context><Document></Document></Recipient68>
    <Recipient69><Context></Context><Document></Document></Recipient69>
    <Recipient70><Context></Context><Document></Document></Recipient70>
    <Recipient71><Context></Context><Document></Document></Recipient71>
    <Recipient72><Context></Context><Document></Document></Recipient72>
    <Recipient73><Context></Context><Document></Document></Recipient73>
    <Recipient74><Context></Context><Document></Document></Recipient74>
    <Recipient75><Context></Context><Document></Document></Recipient75>
    <Recipient76><Context></Context><Document></Document></Recipient76>
    <Recipient77><Context></Context><Document></Document></Recipient77>
    <Recipient78><Context></Context><Document></Document></Recipient78>
    <Recipient79><Context></Context><Document></Document></Recipient79>
    <Recipient80><Context></Context><Document></Document></Recipient80>
    <Recipient81><Context></Context><Document></Document></Recipient81>
    <Recipient82><Context></Context><Document></Document></Recipient82>
    <Recipient83><Context></Context><Document></Document></Recipient83>
    <Recipient84><Context></Context><Document></Document></Recipient84>
    <Recipient85><Context></Context><Document></Document></Recipient85>
    <Recipient86><Context></Context><Document></Document></Recipient86>
    <Recipient87><Context></Context><Document></Document></Recipient87>
    <Recipient88><Context></Context><Document></Document></Recipient88>
    <Recipient89><Context></Context><Document></Document></Recipient89>
    <Recipient90><Context></Context><Document></Document></Recipient90>
    <Recipient91><Context></Context><Document></Document></Recipient91>
    <Recipient92><Context></Context><Document></Document></Recipient92>
    <Recipient93><Context></Context><Document></Document></Recipient93>
    <Recipient94><Context></Context><Document></Document></Recipient94>
    <Recipient95><Context></Context><Document></Document></Recipient95>
    <Recipient96><Context></Context><Document></Document></Recipient96>
    <Recipient97><Context></Context><Document></Document></Recipient97>
    <Recipient98><Context></Context><Document></Document></Recipient98>
    <Recipient99><Context></Context><Document></Document></Recipient99>
    <Recipient100><Context></Context><Document></Document></Recipient100>
</DocumentSet>

Code:
awk -f yanx.awk -f xmlsplit.awk ROWS=10 input3 input3

x.0001, etc
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    </DocumentSet>


Last edited by Corona688; 12-18-2018 at 12:22 PM..
# 11  
Thanks a lot for your help. It worked with the latest code that was my expected output.Smilie
# 12  
Hello Corona,

Happy New Year !!

Need one small input for the same thread requirement for the below script what I am trying to do is looping thru input files
and passing it to split command in a loop

Issue is every loop it creates unique file name with x.001 so already existing x.001 file gets replaced is there a way
i can pass variable to output file X="x." or can i move the file name before the second iteration kindly assist

Code:
# Add all Input files to array
FileList=($(ls | grep "sampletest\\.[0-9]"))

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 
   echo "$x"

#File Split Begin

awk -f xml_String_split.awk -f xml_split.awk X="x." ROWS="400" $x $x
done

for f in x.*; do mv "$f" "${f/x/Extrfile}.xml";
done
# add all files to array
arr=($(ls | grep "Extrfile\\.[0-9]"))

Thanks .
Moderator's Comments:
Mod Comment MOD's comment: Again, please do wrap your code into [CODE]your samples...[/CODE] please as per forum rules else you may get infraction for continuously NOT following forum rules.

Last edited by RavinderSingh13; 02-06-2019 at 01:39 AM..
# 13  
Firstly I'm assuming you are using Corona688 's code from post #10.

You don't need to specify X on the command line for this version (OUT= was set in the BEGIN block instead).

If you change the code as follows (changes is red):

Code:
BEGIN {
        ORS=""
        # OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(!length(OUT)) FBASE=FILENAME "."
                else FBASE = OUT "."
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", FBASE,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

This will create files with your XML filename followed by .nnnnn filenumbers or you can specify a name on the command line eg:

Code:
awk -f xml_String_split.awk -f xml_split.awk OUT=$x"_split" ROWS="400" $x $x

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Is there a way to append both at header and footer of a file
jediwannabe
currently I've a file Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('024', '024', 2, sysdate); Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('032', '032', 2, sysdate); ........ is it...... Shell Programming and Scripting
3
Shell Programming and Scripting
Removing header or footer from file
sridhardwh
Hi Every one, what is the coomand to remove header or footer from a file. Please help me by providing command/syntax to remove header/footer from unix. Thanks in advance for all your support.... Shell Programming and Scripting
5
Shell Programming and Scripting
Add header and footer with record count in footer
itsranjan
This is my file(Target.txt) name|age|locaction abc|23|del xyz|24|mum jkl|25|kol The file should be like this 1|03252012 1|name|age|location 2|abc|23|del 2|xyz|24|mum 2|jkl|25|kol 2|kkk|26|hyd 3|4 Column 1 is row indicator for row 1 and 2, column indicator is 1,for data rows...... Shell Programming and Scripting
1
Shell Programming and Scripting
Split large file and add header and footer to each small files
ashish4422
I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?... Shell Programming and Scripting
7
Shell Programming and Scripting
Split large file and add header and footer to each file
ashish4422
I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?... Shell Programming and Scripting
1
Shell Programming and Scripting

Featured Tech Videos