Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907870 by Aviktheory11 on Wednesday 2nd of July 2014 07:16:40 AM
Old 07-02-2014
Linux Help with Splitting a Large XML file based on size AND tags

Hi All,

This is my first post here. Hoping to share and gain knowledge from this great forum !!!!

I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem.

I'm trying to split a large XML file (with multiple tag sets) into smaller files of equal size so that the splitting doesn't happen between tags, i.e. I'm trying to have a complete tag set in a file. The size limit of the smaller files is specified in a parameter file. For example, if the size limit is 100 KB, and the Large file is 440 KB, I should have five smaller files of sizes 100 KB,100 KB,100 KB,100 KB and 40 KB.

My initial approach was to create the large file with all the complete tag sets in a single line each, and then to use the split function based on the size limit. However, the complete tag sets are not getting accommodated in single lines since the XMLs are itself Huge. So I was thinking of splitting the large file based on tags, as well as within the size limit.

Below is what I tried to do so far

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params       # contains the parameter sizelimit
FILE="datafile.txt"
sqlplus -s userid/password@DB <<EOF
SET HEADING OFF
SET PAGESIZE 0
SET LINESIZE 32000
SET LONG 32000
SET NEWPAGE NONE
SET FEEDBACK OFF
SET TRIMSPOOL ON
SET DEFINE ON     
SET VERIFY OFF
SET SERVEROUTPUT OFF
SPOOL $FILE
[....query to create the master file...]
SPOOL OFF
EXIT
EOF
filesize= ls -l $FILE | awk '{print $5}'
#echo $filesize
#echo $sizelimit 
if ! echo "$filesize $sizelimit -p" | bc | grep  > /dev/null ^-;
then split -b $sizelimit $FILE part
else echo "less than the limit"
fi

This was the first attempt in using Split function. However, I don't think this can be used, given my criterion. Assuming the tag sets are like <URL>...</URL>, can anyone suggest any other way out?

Thanks a lot,

- Avik

Last edited by Scrutinizer; 07-02-2014 at 08:53 AM.. Reason: Changed ICODE to CODE tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
XML::Writer::Simple(3pm)				User Contributed Perl Documentation				  XML::Writer::Simple(3pm)

NAME
XML::Writer::Simple - Create XML files easily! SYNOPSIS
use XML::Writer::Simple dtd => "file.dtd"; print xml_header(encoding => 'iso-8859-1'); print para("foo",b("bar"),"zbr"); # if you want CGI but you do not want CGI :) use XML::Writer::Simple ':html'; USAGE
This module takes some ideas from CGI to make easier the life for those who need to generated XML code. You can use the module in three flavours (or combine them): tags When importing the module you can specify the tags you will be using: use XML::Writer::Simple tags => [qw/p b i tt/]; print p("Hey, ",b("you"),"! ", i("Yes ", b("you"))); that will generate <p>Hey <b>you</b>! <i>Yes <b>you</b></i></p> dtd You can supply a DTD, that will be analyzed, and the tags used: use XML::Writer::Simple dtd => "tmx.dtd"; print tu(seg("foo"),seg("bar")); xml You can supply an XML (or a reference to a list of XML files). They will be parsed, and the tags used: use XML::Writer::Simple xml => "foo.xml"; print foo("bar"); partial You can supply an 'partial' key, to generate prototypes for partial tags construction. For instance: use XML::Writer::Simple tags => qw/foo bar/, partial => 1; print start_foo; print ... print end_foo; You can also use tagsets, where sets of tags from a well known format are imported. For example, to use HTML: use XML::Writer::Simple ':html'; EXPORT
This module export one function for each element at the dtd or xml file you are using. See below for details. FUNCTIONS
import Used when you 'use' the module, should not be used directly. xml_header This function returns the xml header string, without encoding definition, with a trailing new line. Default XML encoding should be UTF-8, by the way. You can force an encoding passing it as argument: print xml_header(encoding=>'iso-8859-1'); powertag Used to specify a powertag. For instance: powertag("ul","li"); ul_li([qw/foo bar zbr ugh/]); will generate <ul> <li>foo</li> <li>bar</li> <li>zbr</li> <li>ugh</li> </ul> You can also supply this information when loading the module, with use XML::Writer::Simple powertags=>["ul_li","ol_li"]; Powertags support three level tags as well: use XML::Writer::Simple powertags=>["table_tr_td"]; print table_tr_td(['a','b','c'],['d','e','f']); AUTHOR
Alberto Simo~es, "<ambs@cpan.org>" BUGS
Please report any bugs or feature requests to "bug-xml-writer-simple@rt.cpan.org", or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Writer-Simple <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Writer-Simple>. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. COPYRIGHT AND LICENSE
Copyright 1999-2012 Project Natura. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2012-06-05 XML::Writer::Simple(3pm)
All times are GMT -4. The time now is 07:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy