Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907992 by Chubler_XL on Thursday 3rd of July 2014 12:43:27 AM
Old 07-03-2014
Sorry I should have tried my code on more than 1 large URL as I have forgotten to reset the bytes variable please accept this updated version:

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params        # contains the parameter sizelimit
...

if [ $(stat -c%s $FILE) -gt $sizelimit ]
then
    awk -v limit=$sizelimit '
        BEGIN { num=1 }
        {
          if ((bytes+=length)>limit) {
             close(FILENAME "." num)
             bytes=length
             num++
          }
          printf "%s%s",$0,RS > FILENAME "." num
        } ' RS="</URL>" $FILE
else
   echo "$FILE: already less than the limit of $sizelimit"
fi

This User Gave Thanks to Chubler_XL For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
MKDoc::XML::Stripper(3pm)				User Contributed Perl Documentation				 MKDoc::XML::Stripper(3pm)

NAME
MKDoc::XML::Stripper - Remove unwanted XML / XHTML tags and attributes SYNOPSIS
use MKDoc::XML::Stripper; my $stripper = new MKDoc::XML::Stripper; $stripper->allow (qw /p class id/); my $ugly = '<p class="para" style="color:red">Hello, <strong>World</strong>!</p>'; my $neat = $stripper->process_data ($ugly); print $neat; Should print: <p class="para">Hello, World!</p> SUMMARY
MKDoc::XML::Stripper is a class which lets you specify a set of tags and attributes which you want to allow, and then cheekily strip any XML of unwanted tags and attributes. In MKDoc, this is used so that editors use structural XHTML rather than presentational tags, i.e. strip anything which looks like a <font> tag, a 'style' attribute or other tags which would break separation of structure from content. DISCLAIMER
This module does low level XML manipulation. It will somehow parse even broken XML and try to do something with it. Do not use it unless you know what you're doing. API
my $stripper = MKDoc::XML::Stripper->new() Instantiates a new MKDoc::XML::Stripper object. $stripper->load_def ($def_name); Loads a definition located somewhere in @INC under MKDoc/XML/Stripper. Available definitions are: xhtml10frameset xhtml10strict xhtml10transitional mkdoc16 - MKDoc 1.6. XHTML structural markup You can also load your own definition file, for instance: $stripper->load_def ('my_def.txt'); Definitions are simple text files as follows: # allow p with 'class' and id p class p id # allow more stuff td class td id td style # etc... $stripper->allow ($tag, @attributes) Allows "<$tag>" to appear in the stripped XML. Additionally, allows @attributes to appear as attributes of <$tag>, so for instance: $stripper->allow ('p', 'class', 'id'); Will allow the following: <p> <p class="foo"> <p id="bar"> <p class="foo" id="bar"> However any extra attributes will be stripped, i.e. <p class="foo" id="bar" style="font-color: red"> Will be rewritten as <p class="foo" id="bar"> $stripper->disallow ($tag) Explicitly disallows a tag and all its associated attributes. By default everything is disallowed. $stripper->process_data ($some_xml); Strips $some_xml according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify $some_xml in place. $stripper->process_file ('/an/xml/file.xml'); Strips '/an/xml/file.xml' according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify '/an/xml/file.xml' in place. NOTES
MKDoc::XML::Stripper does not really parse the XML file you're giving to it nor does it care if the XML is well-formed or not. It uses MKDoc::XML::Tokenizer to turn the XML / XHTML file into a series of MKDoc::XML::Token objects and strictly operates on a list of tokens. For this same reason MKDoc::XML::Stripper does not support namespaces. AUTHOR
Copyright 2003 - MKDoc Holdings Ltd. Author: Jean-Michel Hiver This module is free software and is distributed under the same license as Perl itself. Use it at your own risk. SEE ALSO
MKDoc::XML::Tokenizer MKDoc::XML::Token perl v5.10.1 2004-10-06 MKDoc::XML::Stripper(3pm)
All times are GMT -4. The time now is 07:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy