Just be careful awk and many other unix utilities have limits on the length of a single line you may be better off putting a newline character after each </URL>
---------- Post updated at 10:17 AM ---------- Previous update was at 10:06 AM ----------
Depending on your OS the stat command I used above may not be available. A much more portable (but possible less efficient) version would be:
Last edited by Chubler_XL; 07-02-2014 at 09:19 PM..
Reason: close previous file to ensure awk openfile limit is not exceeded
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
I need to write a shell script for below scenario
My input file has data in format:
qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345... (19 Replies)
Hi Experts,
I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is:
Master.....
First...
second....
second...
third..
third...
Master...
First..
second...
third...
Master...
First...
second..
second..
second..... (2 Replies)
I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this:
HMMER3/b
NAME 1-cysPrx_C
ACC ... (2 Replies)
HI All,
I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz.
eg:
<?xml version="1.0"?>... (4 Replies)
I will simplify the explaination a bit, I need to parse through a 87m file -
I have a single text file in the form of :
<NAME>house........
SOMETEXT
SOMETEXT
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
. (6 Replies)
Hello All ,
Please help me with below requirement
I want to split a xml file based on tag.here is the file format
<data-set>
some-information
</data-set>
<data-set1>
some-information
</data-set1>
<data-set2>
some-information
</data-set2>
I want to split the above file into 3... (5 Replies)
Hi Everyone,
I'm new here and I was checking this old post:
/shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html
(cannot paste link because of lack of points)
I need to do something like this but understand very little of perl.
I also check... (4 Replies)
Hi,
I'm having a xml file with multiple xml header. so i want to split the file into multiple files.
Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix.
eg :
<?xml version="1.0" encoding="UTF-8"?>
<ml:individual... (3 Replies)
MKDoc::XML::Stripper(3pm) User Contributed Perl Documentation MKDoc::XML::Stripper(3pm)NAME
MKDoc::XML::Stripper - Remove unwanted XML / XHTML tags and attributes
SYNOPSIS
use MKDoc::XML::Stripper;
my $stripper = new MKDoc::XML::Stripper;
$stripper->allow (qw /p class id/);
my $ugly = '<p class="para" style="color:red">Hello, <strong>World</strong>!</p>';
my $neat = $stripper->process_data ($ugly);
print $neat;
Should print:
<p class="para">Hello, World!</p>
SUMMARY
MKDoc::XML::Stripper is a class which lets you specify a set of tags and attributes which you want to allow, and then cheekily strip any
XML of unwanted tags and attributes.
In MKDoc, this is used so that editors use structural XHTML rather than presentational tags, i.e. strip anything which looks like a <font>
tag, a 'style' attribute or other tags which would break separation of structure from content.
DISCLAIMER
This module does low level XML manipulation. It will somehow parse even broken XML and try to do something with it. Do not use it unless
you know what you're doing.
API
my $stripper = MKDoc::XML::Stripper->new()
Instantiates a new MKDoc::XML::Stripper object.
$stripper->load_def ($def_name);
Loads a definition located somewhere in @INC under MKDoc/XML/Stripper.
Available definitions are:
xhtml10frameset
xhtml10strict
xhtml10transitional
mkdoc16 - MKDoc 1.6. XHTML structural markup
You can also load your own definition file, for instance:
$stripper->load_def ('my_def.txt');
Definitions are simple text files as follows:
# allow p with 'class' and id
p class
p id
# allow more stuff
td class
td id
td style
# etc...
$stripper->allow ($tag, @attributes)
Allows "<$tag>" to appear in the stripped XML. Additionally, allows @attributes to appear as attributes of <$tag>, so for instance:
$stripper->allow ('p', 'class', 'id');
Will allow the following:
<p>
<p class="foo">
<p id="bar">
<p class="foo" id="bar">
However any extra attributes will be stripped, i.e.
<p class="foo" id="bar" style="font-color: red">
Will be rewritten as
<p class="foo" id="bar">
$stripper->disallow ($tag)
Explicitly disallows a tag and all its associated attributes. By default everything is disallowed.
$stripper->process_data ($some_xml);
Strips $some_xml according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify
$some_xml in place.
$stripper->process_file ('/an/xml/file.xml');
Strips '/an/xml/file.xml' according to the rules that were given with the allow() and disallow() methods and returns the result. Does not
modify '/an/xml/file.xml' in place.
NOTES
MKDoc::XML::Stripper does not really parse the XML file you're giving to it nor does it care if the XML is well-formed or not. It uses
MKDoc::XML::Tokenizer to turn the XML / XHTML file into a series of MKDoc::XML::Token objects and strictly operates on a list of tokens.
For this same reason MKDoc::XML::Stripper does not support namespaces.
AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.
SEE ALSO
MKDoc::XML::Tokenizer MKDoc::XML::Token
perl v5.10.1 2004-10-06 MKDoc::XML::Stripper(3pm)