Split large xml into mutiple files and with header and footer in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split large xml into mutiple files and with header and footer in file
# 8  
Old 12-17-2018
Thanks Rudic for the input after Your suggestion Corona updated the code and it worked and i need small change to it my footer is different i will update it with the xml input and output how it should look like.

--- Post updated at 11:08 PM ---

Hi Corona,

Thank you so much it worked with your updated code I am able to split the large file into mutiple chunks and i need small change in the output as my footer is different now.Kindly assist on the below

First 2 lines is considered as header:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

Last line which is a EOF is the footer
---Footer
Code:
 </DocumentSet>

Input :

Header

Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>

---Body 
    <Recipient>
        <Context>
            <TESTER>08</TESTER>
            <name>TEST</name>
            <Locale>en_AU</Locale>
            <Channel>kjsdhfuis</Channel>
            <UserId>8</UserId>
            <HLX>000000</HLX>
            <Key1>TEST1</Key1>
            <Key2>TEST2</Key2>
            <Key3>TEST3</Key3>
            <KeyID>hotdirectorytest</KeyID>
            <dummy2222>TEST7</dummy2222>
            <EffectiveFrom>20170612000000</EffectiveFrom>
            <Currency>AUD</Currency>
        </Context>
        <Document>
            <Form>
                <Name>TESTER2</Name>
                <Data>
                    <DocumentSetC>
                        <HeaderData>
                            <TESTER>08</TESTER>
                            <Channel>kjsdhfuis</Channel>
                            <UserId>X009189</UserId>
                            <HLX>000000</HLX>
                            <dummy>08VIC000000</dummy>
                            <Key1>TEST2</Key1>
                            <Key2>TEST3</Key2>
                            <Key3/>
                            <KeyID>TEST70</KeyID>
                            <dummy2222>Approval Letter</dummy2222>
                            <TEST7>APPA08120617206891</TEST7>
                            <EffectiveFrom>20170612000000</EffectiveFrom>
                            <HLX44>12345</HLX44>
                            <SystemDate>20170612</SystemDate>
                        </HeaderData>
                        <FormData>
                            <Name>TESTER2</Name>
                            <Context>
                                <UniqueDocID>1240525</UniqueDocID>
                                <dummy11112233>LEN_APP_0010_OUT</dummy11112233>
                                <TEST2ApprovedAmount>8989</TEST2ApprovedAmount>
                            </Context>
                            <ReceivingParty>
                                <Applicant>
                                    <TEST45456>sfdsfnsdfnff  </TEST45456>
                                </Applicant>
                                <IndividualDemographics>
                                
                                </IndividualDemographics>
                                <DeliveryChannel>POST</DeliveryChannel>
                                <NoOfCopies>1</NoOfCopies>
                            </ReceivingParty>
                            <Application>
                                <ProductGroups>
                            <TEST454567>sfdsfnsdfnff  </TEST454567>

                                </ProductGroups>
                            </Application>
                        </FormData>
                    </DocumentSetC>
                </Data>
            </Form>
            <TYP1>5</TYP1>
        </Document>
    </Recipient>
       <Recipient2> ---</Recipient2>
           ---------------
           -------------- 
           -----------------
            -----------------
          <Recipient18000> ---</Recipient18000>
    
---Footer
 </DocumentSet>



Output:
Below is the output I am expecting its 1 file example so every file should have those header and footer
File1:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
<Recipient1>  </Recipient1>
<Recipient2>  </Recipient2>
<Recipient3>  </Recipient3>
-------------------
-------------------
-------------------
<Recipient100>  </Recipient100>
</DocumentSet>


Last edited by karthik; 12-17-2018 at 08:47 PM..
# 9  
Old 12-18-2018
That is not a small change. I will have to completely rewrite it.

Do you truly want all the data stripped out of your recipient tags? Really? Show representative output.

Last edited by Corona688; 12-18-2018 at 11:50 AM..
# 10  
Old 12-18-2018
xmlsplit2.awk
Code:
BEGIN {
        ORS=""
        OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", OUT,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

input3
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    <Recipient11><Context></Context><Document></Document></Recipient11>
    <Recipient12><Context></Context><Document></Document></Recipient12>
    <Recipient13><Context></Context><Document></Document></Recipient13>
    <Recipient14><Context></Context><Document></Document></Recipient14>
    <Recipient15><Context></Context><Document></Document></Recipient15>
    <Recipient16><Context></Context><Document></Document></Recipient16>
    <Recipient17><Context></Context><Document></Document></Recipient17>
    <Recipient18><Context></Context><Document></Document></Recipient18>
    <Recipient19><Context></Context><Document></Document></Recipient19>
    <Recipient20><Context></Context><Document></Document></Recipient20>
    <Recipient21><Context></Context><Document></Document></Recipient21>
    <Recipient22><Context></Context><Document></Document></Recipient22>
    <Recipient23><Context></Context><Document></Document></Recipient23>
    <Recipient24><Context></Context><Document></Document></Recipient24>
    <Recipient25><Context></Context><Document></Document></Recipient25>
    <Recipient26><Context></Context><Document></Document></Recipient26>
    <Recipient27><Context></Context><Document></Document></Recipient27>
    <Recipient28><Context></Context><Document></Document></Recipient28>
    <Recipient29><Context></Context><Document></Document></Recipient29>
    <Recipient30><Context></Context><Document></Document></Recipient30>
    <Recipient31><Context></Context><Document></Document></Recipient31>
    <Recipient32><Context></Context><Document></Document></Recipient32>
    <Recipient33><Context></Context><Document></Document></Recipient33>
    <Recipient34><Context></Context><Document></Document></Recipient34>
    <Recipient35><Context></Context><Document></Document></Recipient35>
    <Recipient36><Context></Context><Document></Document></Recipient36>
    <Recipient37><Context></Context><Document></Document></Recipient37>
    <Recipient38><Context></Context><Document></Document></Recipient38>
    <Recipient39><Context></Context><Document></Document></Recipient39>
    <Recipient40><Context></Context><Document></Document></Recipient40>
    <Recipient41><Context></Context><Document></Document></Recipient41>
    <Recipient42><Context></Context><Document></Document></Recipient42>
    <Recipient43><Context></Context><Document></Document></Recipient43>
    <Recipient44><Context></Context><Document></Document></Recipient44>
    <Recipient45><Context></Context><Document></Document></Recipient45>
    <Recipient46><Context></Context><Document></Document></Recipient46>
    <Recipient47><Context></Context><Document></Document></Recipient47>
    <Recipient48><Context></Context><Document></Document></Recipient48>
    <Recipient49><Context></Context><Document></Document></Recipient49>
    <Recipient50><Context></Context><Document></Document></Recipient50>
    <Recipient51><Context></Context><Document></Document></Recipient51>
    <Recipient52><Context></Context><Document></Document></Recipient52>
    <Recipient53><Context></Context><Document></Document></Recipient53>
    <Recipient54><Context></Context><Document></Document></Recipient54>
    <Recipient55><Context></Context><Document></Document></Recipient55>
    <Recipient56><Context></Context><Document></Document></Recipient56>
    <Recipient57><Context></Context><Document></Document></Recipient57>
    <Recipient58><Context></Context><Document></Document></Recipient58>
    <Recipient59><Context></Context><Document></Document></Recipient59>
    <Recipient60><Context></Context><Document></Document></Recipient60>
    <Recipient61><Context></Context><Document></Document></Recipient61>
    <Recipient62><Context></Context><Document></Document></Recipient62>
    <Recipient63><Context></Context><Document></Document></Recipient63>
    <Recipient64><Context></Context><Document></Document></Recipient64>
    <Recipient65><Context></Context><Document></Document></Recipient65>
    <Recipient66><Context></Context><Document></Document></Recipient66>
    <Recipient67><Context></Context><Document></Document></Recipient67>
    <Recipient68><Context></Context><Document></Document></Recipient68>
    <Recipient69><Context></Context><Document></Document></Recipient69>
    <Recipient70><Context></Context><Document></Document></Recipient70>
    <Recipient71><Context></Context><Document></Document></Recipient71>
    <Recipient72><Context></Context><Document></Document></Recipient72>
    <Recipient73><Context></Context><Document></Document></Recipient73>
    <Recipient74><Context></Context><Document></Document></Recipient74>
    <Recipient75><Context></Context><Document></Document></Recipient75>
    <Recipient76><Context></Context><Document></Document></Recipient76>
    <Recipient77><Context></Context><Document></Document></Recipient77>
    <Recipient78><Context></Context><Document></Document></Recipient78>
    <Recipient79><Context></Context><Document></Document></Recipient79>
    <Recipient80><Context></Context><Document></Document></Recipient80>
    <Recipient81><Context></Context><Document></Document></Recipient81>
    <Recipient82><Context></Context><Document></Document></Recipient82>
    <Recipient83><Context></Context><Document></Document></Recipient83>
    <Recipient84><Context></Context><Document></Document></Recipient84>
    <Recipient85><Context></Context><Document></Document></Recipient85>
    <Recipient86><Context></Context><Document></Document></Recipient86>
    <Recipient87><Context></Context><Document></Document></Recipient87>
    <Recipient88><Context></Context><Document></Document></Recipient88>
    <Recipient89><Context></Context><Document></Document></Recipient89>
    <Recipient90><Context></Context><Document></Document></Recipient90>
    <Recipient91><Context></Context><Document></Document></Recipient91>
    <Recipient92><Context></Context><Document></Document></Recipient92>
    <Recipient93><Context></Context><Document></Document></Recipient93>
    <Recipient94><Context></Context><Document></Document></Recipient94>
    <Recipient95><Context></Context><Document></Document></Recipient95>
    <Recipient96><Context></Context><Document></Document></Recipient96>
    <Recipient97><Context></Context><Document></Document></Recipient97>
    <Recipient98><Context></Context><Document></Document></Recipient98>
    <Recipient99><Context></Context><Document></Document></Recipient99>
    <Recipient100><Context></Context><Document></Document></Recipient100>
</DocumentSet>

Code:
awk -f yanx.awk -f xmlsplit.awk ROWS=10 input3 input3

x.0001, etc
Code:
<?xml version="1.0" encoding="UTF-8"?>
<DocumentSet>
    <Recipient><Context></Context><Document></Document></Recipient>
    <Recipient2><Context></Context><Document></Document></Recipient2>
    <Recipient3><Context></Context><Document></Document></Recipient3>
    <Recipient4><Context></Context><Document></Document></Recipient4>
    <Recipient5><Context></Context><Document></Document></Recipient5>
    <Recipient6><Context></Context><Document></Document></Recipient6>
    <Recipient7><Context></Context><Document></Document></Recipient7>
    <Recipient8><Context></Context><Document></Document></Recipient8>
    <Recipient9><Context></Context><Document></Document></Recipient9>
    <Recipient10><Context></Context><Document></Document></Recipient10>
    </DocumentSet>


Last edited by Corona688; 12-18-2018 at 12:22 PM..
# 11  
Old 12-18-2018
Thanks a lot for your help. It worked with the latest code that was my expected output.Smilie
# 12  
Old 02-05-2019
Hello Corona,

Happy New Year !!

Need one small input for the same thread requirement for the below script what I am trying to do is looping thru input files
and passing it to split command in a loop

Issue is every loop it creates unique file name with x.001 so already existing x.001 file gets replaced is there a way
i can pass variable to output file X="x." or can i move the file name before the second iteration kindly assist

Code:
# Add all Input files to array
FileList=($(ls | grep "sampletest\\.[0-9]"))

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 
   echo "$x"

#File Split Begin

awk -f xml_String_split.awk -f xml_split.awk X="x." ROWS="400" $x $x
done

for f in x.*; do mv "$f" "${f/x/Extrfile}.xml";
done
# add all files to array
arr=($(ls | grep "Extrfile\\.[0-9]"))

Thanks .
Moderator's Comments:
Mod Comment MOD's comment: Again, please do wrap your code into [CODE]your samples...[/CODE] please as per forum rules else you may get infraction for continuously NOT following forum rules.

Last edited by RavinderSingh13; 02-06-2019 at 01:39 AM..
# 13  
Old 02-05-2019
Firstly I'm assuming you are using Corona688 's code from post #10.

You don't need to specify X on the command line for this version (OUT= was set in the BEGIN block instead).

If you change the code as follows (changes is red):

Code:
BEGIN {
        ORS=""
        # OUT="x."
        ROWS=5
        ROWTAG="^RECIPIENT[0-9]*$"
        HDRTAG="^DOCUMENTSET$"
        FTRTAG="^DOCUMENTSET$"
}

# First pass, remember headers and footers
NR==FNR {
        if(!HDREND)
        {
                HDR=HDR RS $1 OFS $2
                if(TAG ~ HDRTAG) HDREND=FNR
                next
        }

        if(FTRSTART || (CTAG ~ FTRTAG))
        {
                FTR=FTR RS $1 OFS $2
                if(CTAG ~ FTRTAG) FTRSTART=FNR
        }

        next
}

# Skip header and footer
(FNR <= HDREND) || (FNR >= FTRSTART) { next }

# Close output file once enough DOCUMENT records
((XNR%(ROWS+1)) == 0) {
#       printf("FNR==%d XNR==%d FILE=%s\n", FNR, XNR, FILE)>"/dev/stderr"
        if(!length(OUT)) FBASE=FILENAME "."
                else FBASE = OUT "."
        if(FILE) {
                print FTR > FILE
                close(FILE);
        }

        FILE=sprintf("%s%04d", FBASE,++FILENUM);
        print HDR > FILE
        XNR++
}

{       print RS $0 > FILE      }

CTAG ~ ROWTAG { XNR++ }

END {   if(FILE) print FTR > FILE       }

This will create files with your XML filename followed by .nnnnn filenumbers or you can specify a name on the command line eg:

Code:
awk -f xml_String_split.awk -f xml_split.awk OUT=$x"_split" ROWS="400" $x $x

This User Gave Thanks to Chubler_XL For This Post:
# 14  
Old 02-06-2019
Thanks a lot Chubler it worked .
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Eliminate Header and footer from EBCDIC file

Is there any command to eliminate Header and footer from EBCDIC file (4 Replies)
Discussion started by: abhilashnair
4 Replies

2. UNIX for Dummies Questions & Answers

File Row Line Count without Header Footer

Hi There! I am saving the file count of all files in a directory to an output file using: wc -l * > FileCount.txt I get: 114 G4SXORD 3 G4SXORH 0 G4SXORP 117 total But this count includes header and footer. I want to subtract 2 from the count and get ... (7 Replies)
Discussion started by: gagan8877
7 Replies

3. Shell Programming and Scripting

Is there a way to append both at header and footer of a file

currently I've a file Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('024', '024', 2, sysdate); Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('032', '032', 2, sysdate); ........ is it... (3 Replies)
Discussion started by: jediwannabe
3 Replies

4. Shell Programming and Scripting

Removing header or footer from file

Hi Every one, what is the coomand to remove header or footer from a file. Please help me by providing command/syntax to remove header/footer from unix. Thanks in advance for all your support. (5 Replies)
Discussion started by: sridhardwh
5 Replies

5. Shell Programming and Scripting

Add header and footer with record count in footer

This is my file(Target.txt) name|age|locaction abc|23|del xyz|24|mum jkl|25|kol The file should be like this 1|03252012 1|name|age|location 2|abc|23|del 2|xyz|24|mum 2|jkl|25|kol 2|kkk|26|hyd 3|4 Column 1 is row indicator for row 1 and 2, column indicator is 1,for data rows... (1 Reply)
Discussion started by: itsranjan
1 Replies

6. Shell Programming and Scripting

sort a report file having header and footer

I am having report file with header and footer . The details in between header and footer are separated by a pipe charater. I want to sort the file by considering multiple columns in between header and footer. pls help (4 Replies)
Discussion started by: suryanarayana
4 Replies

7. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies

8. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

9. Shell Programming and Scripting

Total of lines w/out header and footer incude for a file

I am trying to get a total number of tapes w/out headers or footers in a ERV file and append it to the file. For some reason I cannot get it to work. Any ideas? #!/bin/sh dat=`date +"%b%d_%Y"` + date +%b%d_%Y dat=Nov16_2006 tapemgr="/export/home/legato/tapemgr/rpts"... (1 Reply)
Discussion started by: gzs553
1 Replies

10. Shell Programming and Scripting

Need to Chop Header and Footer record from input file

Hi, I need to chope the header and footer record from an input file and make a new output file, please let me know how i can do it in unix.thanks. (4 Replies)
Discussion started by: coolbudy
4 Replies
Login or Register to Ask a Question