Split large xml into mutiple files and with header and footer in file


Login or Register to Reply

 
Thread Tools Search this Thread
# 8  
Old 12-17-2018
Thanks Rudic for the input after Your suggestion Corona updated the code and it worked and i need small change to it my footer is different i will update it with the xml input and output how it should look like.

--- Post updated at 11:08 PM ---

Hi Corona,

Thank you so much it worked with your updated code I am able to split the large file into mutiple chunks and i need small change in the output as my footer is different now.Kindly assist on the below

First 2 lines is considered as header:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

Last line which is a EOF is the footer
---Footer
Code:
 </DocumentSet>

Input :

Header

Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

---Body 
    <Recipient>
        <Context>
            <TESTER>08</TESTER>
            <name>TEST</name>
            <Locale>en_AU</Locale>
            <Channel>kjsdhfuis</Channel>
            <UserId>8</UserId>
            <HLX>000000</HLX>
            <Key1>TEST1</Key1>
            <Key2>TEST2</Key2>
            <Key3>TEST3</Key3>
            <KeyID>hotdirectorytest</KeyID>
            <dummy2222>TEST7</dummy2222>
            <EffectiveFrom>20170612000000</EffectiveFrom>
            <Currency>AUD</Currency>
        </Context>
        <Document>
            <Form>
                <Name>TESTER2</Name>
                <Data>
                    <DocumentSetC>
                        <HeaderData>
                            <TESTER>08</TESTER>
                            <Channel>kjsdhfuis</Channel>
                            <UserId>X009189</UserId>
                            <HLX>000000</HLX>
                            <dummy>08VIC000000</dummy>
                            <Key1>TEST2</Key1>
                            <Key2>TEST3</Key2>
                            <Key3/>
                            <KeyID>TEST70</KeyID>
                            <dummy2222>Approval Letter</dummy2222>
                            <TEST7>APPA08120617206891</TEST7>
                            <EffectiveFrom>20170612000000</EffectiveFrom>
                            <HLX44>12345</HLX44>
                            <SystemDate>20170612</SystemDate>
                        </HeaderData>
                        <FormData>
                            <Name>TESTER2</Name>
                            <Context>
                                <UniqueDocID>1240525</UniqueDocID>
                                <dummy11112233>LEN_APP_0010_OUT</dummy11112233>
                                <TEST2ApprovedAmount>8989</TEST2ApprovedAmount>
                            </Context>
                            <ReceivingParty>
                                <Applicant>
                                    <TEST45456>sfdsfnsdfnff  </TEST45456>
                                </Applicant>
                                <IndividualDemographics>
                                
                                </IndividualDemographics>
                                <DeliveryChannel>POST</DeliveryChannel>
                                <NoOfCopies>1</NoOfCopies>
                            </ReceivingParty>
                            <Application>
                                <ProductGroups>
                            <TEST454567>sfdsfnsdfnff  </TEST454567>

                                </ProductGroups>
                            </Application>
                        </FormData>
                    </DocumentSetC>
                </Data>
            </Form>
            <TYP1>5</TYP1>
        </Document>
    </Recipient>
       <Recipient2> ---</Recipient2>
           ---------------
           -------------- 
           -----------------
            -----------------
          <Recipient18000> ---</Recipient18000>
    
---Footer
 </DocumentSet>



Output:
Below is the output I am expecting its 1 file example so every file should have those header and footer
File1:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
<Recipient1>  </Recipient1>
<Recipient2>  </Recipient2>
<Recipient3>  </Recipient3>
-------------------
-------------------
-------------------
<Recipient100>  </Recipient100>
</DocumentSet>


Last edited by karthik; 12-17-2018 at 08:47 PM..
# 9  
Old 12-18-2018
That is not a small change. I will have to completely rewrite it.

Do you truly want all the data stripped out of your recipient tags? Really? Show representative output.

Last edited by Corona688; 12-18-2018 at 11:50 AM..
# 10  
Old 12-18-2018
xmlsplit2.awk
Code:
BEGIN {
        ORS=""
        OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", OUT,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

input3
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    <Recipient11><Context></Context><Document></Document></Recipient11>
    <Recipient12><Context></Context><Document></Document></Recipient12>
    <Recipient13><Context></Context><Document></Document></Recipient13>
    <Recipient14><Context></Context><Document></Document></Recipient14>
    <Recipient15><Context></Context><Document></Document></Recipient15>
    <Recipient16><Context></Context><Document></Document></Recipient16>
    <Recipient17><Context></Context><Document></Document></Recipient17>
    <Recipient18><Context></Context><Document></Document></Recipient18>
    <Recipient19><Context></Context><Document></Document></Recipient19>
    <Recipient20><Context></Context><Document></Document></Recipient20>
    <Recipient21><Context></Context><Document></Document></Recipient21>
    <Recipient22><Context></Context><Document></Document></Recipient22>
    <Recipient23><Context></Context><Document></Document></Recipient23>
    <Recipient24><Context></Context><Document></Document></Recipient24>
    <Recipient25><Context></Context><Document></Document></Recipient25>
    <Recipient26><Context></Context><Document></Document></Recipient26>
    <Recipient27><Context></Context><Document></Document></Recipient27>
    <Recipient28><Context></Context><Document></Document></Recipient28>
    <Recipient29><Context></Context><Document></Document></Recipient29>
    <Recipient30><Context></Context><Document></Document></Recipient30>
    <Recipient31><Context></Context><Document></Document></Recipient31>
    <Recipient32><Context></Context><Document></Document></Recipient32>
    <Recipient33><Context></Context><Document></Document></Recipient33>
    <Recipient34><Context></Context><Document></Document></Recipient34>
    <Recipient35><Context></Context><Document></Document></Recipient35>
    <Recipient36><Context></Context><Document></Document></Recipient36>
    <Recipient37><Context></Context><Document></Document></Recipient37>
    <Recipient38><Context></Context><Document></Document></Recipient38>
    <Recipient39><Context></Context><Document></Document></Recipient39>
    <Recipient40><Context></Context><Document></Document></Recipient40>
    <Recipient41><Context></Context><Document></Document></Recipient41>
    <Recipient42><Context></Context><Document></Document></Recipient42>
    <Recipient43><Context></Context><Document></Document></Recipient43>
    <Recipient44><Context></Context><Document></Document></Recipient44>
    <Recipient45><Context></Context><Document></Document></Recipient45>
    <Recipient46><Context></Context><Document></Document></Recipient46>
    <Recipient47><Context></Context><Document></Document></Recipient47>
    <Recipient48><Context></Context><Document></Document></Recipient48>
    <Recipient49><Context></Context><Document></Document></Recipient49>
    <Recipient50><Context></Context><Document></Document></Recipient50>
    <Recipient51><Context></Context><Document></Document></Recipient51>
    <Recipient52><Context></Context><Document></Document></Recipient52>
    <Recipient53><Context></Context><Document></Document></Recipient53>
    <Recipient54><Context></Context><Document></Document></Recipient54>
    <Recipient55><Context></Context><Document></Document></Recipient55>
    <Recipient56><Context></Context><Document></Document></Recipient56>
    <Recipient57><Context></Context><Document></Document></Recipient57>
    <Recipient58><Context></Context><Document></Document></Recipient58>
    <Recipient59><Context></Context><Document></Document></Recipient59>
    <Recipient60><Context></Context><Document></Document></Recipient60>
    <Recipient61><Context></Context><Document></Document></Recipient61>
    <Recipient62><Context></Context><Document></Document></Recipient62>
    <Recipient63><Context></Context><Document></Document></Recipient63>
    <Recipient64><Context></Context><Document></Document></Recipient64>
    <Recipient65><Context></Context><Document></Document></Recipient65>
    <Recipient66><Context></Context><Document></Document></Recipient66>
    <Recipient67><Context></Context><Document></Document></Recipient67>
    <Recipient68><Context></Context><Document></Document></Recipient68>
    <Recipient69><Context></Context><Document></Document></Recipient69>
    <Recipient70><Context></Context><Document></Document></Recipient70>
    <Recipient71><Context></Context><Document></Document></Recipient71>
    <Recipient72><Context></Context><Document></Document></Recipient72>
    <Recipient73><Context></Context><Document></Document></Recipient73>
    <Recipient74><Context></Context><Document></Document></Recipient74>
    <Recipient75><Context></Context><Document></Document></Recipient75>
    <Recipient76><Context></Context><Document></Document></Recipient76>
    <Recipient77><Context></Context><Document></Document></Recipient77>
    <Recipient78><Context></Context><Document></Document></Recipient78>
    <Recipient79><Context></Context><Document></Document></Recipient79>
    <Recipient80><Context></Context><Document></Document></Recipient80>
    <Recipient81><Context></Context><Document></Document></Recipient81>
    <Recipient82><Context></Context><Document></Document></Recipient82>
    <Recipient83><Context></Context><Document></Document></Recipient83>
    <Recipient84><Context></Context><Document></Document></Recipient84>
    <Recipient85><Context></Context><Document></Document></Recipient85>
    <Recipient86><Context></Context><Document></Document></Recipient86>
    <Recipient87><Context></Context><Document></Document></Recipient87>
    <Recipient88><Context></Context><Document></Document></Recipient88>
    <Recipient89><Context></Context><Document></Document></Recipient89>
    <Recipient90><Context></Context><Document></Document></Recipient90>
    <Recipient91><Context></Context><Document></Document></Recipient91>
    <Recipient92><Context></Context><Document></Document></Recipient92>
    <Recipient93><Context></Context><Document></Document></Recipient93>
    <Recipient94><Context></Context><Document></Document></Recipient94>
    <Recipient95><Context></Context><Document></Document></Recipient95>
    <Recipient96><Context></Context><Document></Document></Recipient96>
    <Recipient97><Context></Context><Document></Document></Recipient97>
    <Recipient98><Context></Context><Document></Document></Recipient98>
    <Recipient99><Context></Context><Document></Document></Recipient99>
    <Recipient100><Context></Context><Document></Document></Recipient100>
</DocumentSet>

Code:
awk -f yanx.awk -f xmlsplit.awk ROWS=10 input3 input3

x.0001, etc
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    </DocumentSet>


Last edited by Corona688; 12-18-2018 at 12:22 PM..
# 11  
Old 12-18-2018
Thanks a lot for your help. It worked with the latest code that was my expected output.Smilie
# 12  
Old 2 Weeks Ago
Hello Corona,

Happy New Year !!

Need one small input for the same thread requirement for the below script what I am trying to do is looping thru input files
and passing it to split command in a loop

Issue is every loop it creates unique file name with x.001 so already existing x.001 file gets replaced is there a way
i can pass variable to output file X="x." or can i move the file name before the second iteration kindly assist

Code:
# Add all Input files to array
FileList=($(ls | grep "sampletest\\.[0-9]"))

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 
   echo "$x"

#File Split Begin

awk -f xml_String_split.awk -f xml_split.awk X="x." ROWS="400" $x $x
done

for f in x.*; do mv "$f" "${f/x/Extrfile}.xml";
done
# add all files to array
arr=($(ls | grep "Extrfile\\.[0-9]"))

Thanks .
Moderator's Comments:
Mod Comment MOD's comment: Again, please do wrap your code into [CODE]your samples...[/CODE] please as per forum rules else you may get infraction for continuously NOT following forum rules.

Last edited by RavinderSingh13; 2 Weeks Ago at 01:39 AM..
# 13  
Old 2 Weeks Ago
Firstly I'm assuming you are using Corona688 's code from post #10.

You don't need to specify X on the command line for this version (OUT= was set in the BEGIN block instead).

If you change the code as follows (changes is red):

Code:
BEGIN {
        ORS=""
        # OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(!length(OUT)) FBASE=FILENAME "."
                else FBASE = OUT "."
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", FBASE,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

This will create files with your XML filename followed by .nnnnn filenumbers or you can specify a name on the command line eg:

Code:
awk -f xml_String_split.awk -f xml_split.awk OUT=$x"_split" ROWS="400" $x $x

This User Gave Thanks to Chubler_XL For This Post:
karthik (2 Weeks Ago)
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Split large file into smaller files without disturbing the entry chunks Kamesh G UNIX for Beginners Questions & Answers 12 05-10-2018 05:39 AM
Eliminate Header and footer from EBCDIC file abhilashnair UNIX for Dummies Questions & Answers 4 12-18-2014 05:48 AM
File Row Line Count without Header Footer gagan8877 UNIX for Dummies Questions & Answers 7 05-02-2013 02:29 PM
Is there a way to append both at header and footer of a file jediwannabe Shell Programming and Scripting 3 02-28-2013 06:57 AM
Help needed - Split large file into smaller files based on pattern match frustrated1 Shell Programming and Scripting 7 01-18-2013 06:02 PM
Removing header or footer from file sridhardwh Shell Programming and Scripting 5 06-04-2012 06:43 AM
Add header and footer with record count in footer itsranjan Shell Programming and Scripting 1 03-25-2012 12:45 AM
Split large zone file dump into multiple files Bluemerlin Shell Programming and Scripting 7 12-21-2011 09:15 AM
sort a report file having header and footer suryanarayana Shell Programming and Scripting 4 11-25-2011 10:48 PM
Ignore Header and Footer and Sort the data in fixed width file sasikari Shell Programming and Scripting 5 07-14-2011 12:42 PM
Using AWK to separate data from a large XML file into multiple files JRy Shell Programming and Scripting 16 10-17-2009 09:06 PM
Split large file and add header and footer to each small files ashish4422 Shell Programming and Scripting 7 07-07-2008 03:13 PM
Split large file and add header and footer to each file ashish4422 Shell Programming and Scripting 1 04-15-2008 07:12 AM
Total of lines w/out header and footer incude for a file gzs553 Shell Programming and Scripting 1 11-16-2006 07:42 PM
Need to Chop Header and Footer record from input file coolbudy Shell Programming and Scripting 4 08-09-2005 01:26 PM