Split xml file into multiple xml based on letterID


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split xml file into multiple xml based on letterID
# 1  
Old 02-05-2016
Lightbulb Split xml file into multiple xml based on letterID

Hi All,
We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId.
In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file)

Header:
Code:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

Footer:
Code:
</CustomerBatch>


I tried below command:

Code:
awk '!/\<CustomerLetter\>/ && !/\<\/CustomerLetter\>/ && !/\<element4\>/ && !/\<element5\>/ && !/\<element6\>/ {f=f"\n"$0} /\<\/CustomerLetter\>/ {print f > "CustomerLetter"++i".xml";f=""}' File.xml

This splits file for each letter without header and footer lines.



Sample:
Code:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19333" dateRequested="2016-02-04" letterCountryCd="US" letterStateCd="FL">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="aFRG@gmail.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="19334"/>
</CustomerLetter>
<CustomerLetter letterId="19334" dateRequested="2016-02-04" langRgnCd="EN-US" letterDate="2016-02-04" letterStateCd="CA">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="ABC4@yahoo.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="455020941"/>
</CustomerLetter>
</CustomerBatch>


The above file should be split in to for 500 letterId:

file 1:
Code:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19333" dateRequested="2016-02-04" letterCountryCd="US" letterStateCd="FL">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="aFRG@gmail.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="19334"/>
</CustomerLetter>
</CustomerBatch>


File2:
Code:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19334" dateRequested="2016-02-04" langRgnCd="EN-US" letterDate="2016-02-04" letterStateCd="CA">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="ABC4@yahoo.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="455020941"/>
</CustomerLetter>
</CustomerBatch>



Thanks
Moderator's Comments:
Mod Comment This account has been placed in read-only mode for three days for repeatedly refusing to properly format posts.

Last edited by Don Cragun; 02-05-2016 at 11:36 AM.. Reason: Add CODE tags.
# 2  
Old 02-05-2016
Hello vx04,

Please use code tags as per forum rules, could you please try following and let me know if this helps.
Code:
awk -vheader1="<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"no\"?>" -vheader2="<CustomerBatch batchId=\"423433\" xmlns:xsi=\"404 not found">"  -vfooter="</CustomerBatch>" 'BEGIN{file=1;} /<\/CustomerLetter>/{;print header1 ORS header2 ORS line ORS $0 ORS footer > file".xml";file++;line="";next} !/<?xml version="1.0"/ && !/<CustomerBatch batchId/{line=line?line ORS $0:$0}'  Input_file

Following will be the 2 output files as per given sample output.
Code:
cat 2.xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19334" dateRequested="2016-02-04" langRgnCd="EN-US" letterDate="2016-02-04" letterStateCd="CA">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="ABC4@yahoo.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="455020941"/>
</CustomerLetter>
</CustomerBatch>

Code:
cat 1.xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19333" dateRequested="2016-02-04" letterCountryCd="US" letterStateCd="FL">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="aFRG@gmail.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="19334"/>
</CustomerLetter>
</CustomerBatch>

EDIT: Adding a non-one liner form of solution of same.
Code:
awk -vheader1="<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"no\"?>" -vheader2="<CustomerBatch batchId=\"423433\" xmlns:xsi=\"404 not found">"  -vfooter="</CustomerBatch>" 
                                              'BEGIN{
                                                        file=1;
                                                    }
                                               /<\/CustomerLetter>/{;
                                                                        print header1 ORS header2 ORS line ORS $0 ORS footer > file".xml";
                                                                        file++;
                                                                        line=""
                                                                        next
                                                                  }
                                               !/<?xml version="1.0"/ && !/<CustomerBatch batchId/{
                                                                                                        line=line?line ORS $0:$0
                                                                                                }
                                              '    Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-05-2016 at 08:55 AM.. Reason: Added a non one-liner form of solution for same. Added a code to nullfy a variable so that files will not have previous files
# 3  
Old 02-05-2016
Thanks the awk worked but it split for each letter ,we need to split for 700 number of letterid
# 4  
Old 02-05-2016
There's an unescaped double quote in header2.
# 5  
Old 02-05-2016
Hello vx04,

If I understood your requirement correctly you need to create xml files with name of CustomLetter ids as follows, let me know if you have any queries on same.
Code:
awk -vheader1="<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"no\"?>" -vheader2="<CustomerBatch batchId=\"423433\" xmlns:xsi=\"404 not found">"  -vfooter="</CustomerBatch>"                             
                                            'BEGIN{
                                                        file=1;
                                                    }
                                               /<\/CustomerLetter>/{;
                                                                        print header1 ORS header2 ORS line ORS $0 ORS footer > file".xml";
                                                                        line=""
                                                                        next
                                                                  }
                                              /CustomerLetter letterId/{
                                                                        gsub(/letterId=|\"/,X,$2);
                                                                        file=$2
                                                                       }
                                              !/<?xml version="1.0"/ && !/<CustomerBatch batchId/{
                                                                                                        line=line?line ORS $0:$0
                                                                                                }
                                              '    Input_file

Output files named 19334.xml and 19333.xml are as follows.
Code:
 cat 19333.xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19333" dateRequested="2016-02-04" letterCountryCd="US" letterStateCd="FL">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="aFRG@gmail.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="19334"/>
</CustomerLetter>
</CustomerBatch>

AND

cat 19334.xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<CustomerBatch batchId="423433" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerLetter letterId="19334" dateRequested="2016-02-04" langRgnCd="EN-US" letterDate="2016-02-04" letterStateCd="CA">
<Recipient dFlag="N">
<RecipientName fullName="Customer"/>
<RecipientDetails emailAddress="ABC4@yahoo.com"/>
<CoRecipientDetails dFlag="N"/>
</Recipient>
<ReturnAddress brandCD="3"/>
<LoanStructure loanNumber="455020941"/>
</CustomerLetter>
</CustomerBatch>

Please let me know if you have any queries.
EDIT: Adding one liner solution for same now.
Code:
awk -vheader1="<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"no\"?>" -vheader2="<CustomerBatch batchId=\"423433\" xmlns:xsi=\"404 not found">"  -vfooter="</CustomerBatch>" 'BEGIN{file=1;} /<\/CustomerLetter>/{;print header1 ORS header2 ORS line ORS $0 ORS footer > file".xml";line="";next} /CustomerLetter letterId/{B=$2;gsub(/letterId=|\"/,X,B);file=B} !/<?xml version="1.0"/ && !/<CustomerBatch batchId/{line=line?line ORS $0:$0}'  Input_file

EDIT2: Sorry my copy-paste have issues so attaching the script in case you are not able to copy it properly.

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-05-2016 at 10:08 AM.. Reason: Added one liner solution for same now. Added script as an attachment now.
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 02-05-2016
What does "we need to split for 700 number of letterid" mean? You want groups of 700 letters to be output to a single file (i.e. 3500 letter will be five files)? You want letters whose ID has "700" in them in separate files?

---------- Post updated at 15:17 ---------- Previous update was at 15:00 ----------

Would this do?
Code:
awk '
NR == 1                 {FT = $0
                         next   
                        }
NR < 4                  {HD = HD DL $0
                         DL = RS
                         next   
                        }
/letterId/              {if (FN) {print FT > FN}
                         if (!(LCNT%LN))        {FN = "file" ++FCNT ".xml"
                                                 print HD > FN
                                                }
                         LCNT++
                        }
                        {print > FN
                        }
' LN=700 <(tail -1 file1) file1


Last edited by RudiC; 02-05-2016 at 10:36 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies

2. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

3. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

4. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

5. Shell Programming and Scripting

Split XML file

Hi Experts, Can you please help me to split following XML file based on new Order ? Actual file is very big. I have taken few lines of it. <?xml version="1.0" encoding="utf-8" standalone="yes"?> <Orders xmlns='http://www.URL.com/Orders'> <Order> <ORDNo>450321</ORDNo> ... (3 Replies)
Discussion started by: meetmedude
3 Replies

6. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

7. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

8. Shell Programming and Scripting

Split xml file into many

Hi, I had a scenario need a help as I am new to this. I have a xml file employee.xml with the below content. <Organisation><employee>xxx</employee><employee>yyy</employee><employee>zzz</employee></Organisation> I want to split the file into multiple file as below. Is there a specifice way... (5 Replies)
Discussion started by: mankuar
5 Replies

9. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

10. Shell Programming and Scripting

Need to Split Big XML into multiple xmls

Hi friends.. We have urgent requirement.We need to split the big xml having multiple orders into multiple xmls having each order in each xml. For Example In input XMl will be in following format with multiple line orders.. <OrderDetail BillToKey="20100805337" Createuserid="CreateGuestOrder"... (8 Replies)
Discussion started by: dprakash
8 Replies
Login or Register to Ask a Question