Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907973 by Chubler_XL on Wednesday 2nd of July 2014 08:17:12 PM
Old 07-02-2014
how about this:

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params        # contains the parameter sizelimit
...

if [ $(stat -c%s $FILE) -gt $sizelimit ]
then
    awk -v limit=$sizelimit '
        BEGIN { num=1 }
        {
          if ((bytes+=length)>limit) {
             close(FILENAME "." num)
             num++
          }
          printf "%s%s",$0,RS > FILENAME "." num
        } ' RS="</URL>" $FILE
else
   echo "$FILE: already less than the limit of $sizelimit"
fi

Just be careful awk and many other unix utilities have limits on the length of a single line you may be better off putting a newline character after each </URL>

---------- Post updated at 10:17 AM ---------- Previous update was at 10:06 AM ----------


Depending on your OS the stat command I used above may not be available. A much more portable (but possible less efficient) version would be:

Code:
if [ $(wc -c < $FILE) -gt $sizelimit ]


Last edited by Chubler_XL; 07-02-2014 at 09:19 PM.. Reason: close previous file to ensure awk openfile limit is not exceeded
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
XML::Writer::Simple(3pm)				User Contributed Perl Documentation				  XML::Writer::Simple(3pm)

NAME
XML::Writer::Simple - Create XML files easily! SYNOPSIS
use XML::Writer::Simple dtd => "file.dtd"; print xml_header(encoding => 'iso-8859-1'); print para("foo",b("bar"),"zbr"); # if you want CGI but you do not want CGI :) use XML::Writer::Simple ':html'; USAGE
This module takes some ideas from CGI to make easier the life for those who need to generated XML code. You can use the module in three flavours (or combine them): tags When importing the module you can specify the tags you will be using: use XML::Writer::Simple tags => [qw/p b i tt/]; print p("Hey, ",b("you"),"! ", i("Yes ", b("you"))); that will generate <p>Hey <b>you</b>! <i>Yes <b>you</b></i></p> dtd You can supply a DTD, that will be analyzed, and the tags used: use XML::Writer::Simple dtd => "tmx.dtd"; print tu(seg("foo"),seg("bar")); xml You can supply an XML (or a reference to a list of XML files). They will be parsed, and the tags used: use XML::Writer::Simple xml => "foo.xml"; print foo("bar"); partial You can supply an 'partial' key, to generate prototypes for partial tags construction. For instance: use XML::Writer::Simple tags => qw/foo bar/, partial => 1; print start_foo; print ... print end_foo; You can also use tagsets, where sets of tags from a well known format are imported. For example, to use HTML: use XML::Writer::Simple ':html'; EXPORT
This module export one function for each element at the dtd or xml file you are using. See below for details. FUNCTIONS
import Used when you 'use' the module, should not be used directly. xml_header This function returns the xml header string, without encoding definition, with a trailing new line. Default XML encoding should be UTF-8, by the way. You can force an encoding passing it as argument: print xml_header(encoding=>'iso-8859-1'); powertag Used to specify a powertag. For instance: powertag("ul","li"); ul_li([qw/foo bar zbr ugh/]); will generate <ul> <li>foo</li> <li>bar</li> <li>zbr</li> <li>ugh</li> </ul> You can also supply this information when loading the module, with use XML::Writer::Simple powertags=>["ul_li","ol_li"]; Powertags support three level tags as well: use XML::Writer::Simple powertags=>["table_tr_td"]; print table_tr_td(['a','b','c'],['d','e','f']); AUTHOR
Alberto Simo~es, "<ambs@cpan.org>" BUGS
Please report any bugs or feature requests to "bug-xml-writer-simple@rt.cpan.org", or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Writer-Simple <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Writer-Simple>. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. COPYRIGHT AND LICENSE
Copyright 1999-2012 Project Natura. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2012-06-05 XML::Writer::Simple(3pm)
All times are GMT -4. The time now is 12:33 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy