06-30-2008
Need some help with parsing
I have a big xml file with little formatting in it. It contains over 600 messages that I need to break each message out in its own separate file.
The xml file looks in the middle of it something like this:
</Title></Msg><Msg><Opener> Hello how
are you?<Title> Some says hello</Title><Body>
This is a test to see how everything is
going. I need your help.</Body></Msg><Msg>
<Open1> An opening.</Open1><Title> Trying
something new.</Title><Report>124555ABC
</Report><Body> Another test for me.</Body>
<PS> I need to figure this out.</PS></Msg>
<Msg> etc........ etc... etc..
.......etc. .......
Some caveats:
1. The messages always start with <Msg>
2. The messages always ends with </Msg>
3. The <Msg> could be at the beginning, middle or end of a line.
4. The </Msg> could be at the beginning, middle or end of a line.
5. There can be different amount of tag in a line i.e. <Title><Body>,etc...
6. Message could be one to 100+ lines long.
Any suggestion on breaking each message from this xml file into its own file. Any sed/awk/nawk shell function/statements would be appreciated.
In the end, there is 600+ messages so there should be 600+ files.
Thank you.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I want to parse this file....
( 0 , 0 ) =>heading1
( 0 , 1 ) =>value1.1a
( 0 , 2 ) =>value2.1a
( 1 , 0 ) =>heading2
( 1 , 1 ) =>value1.1b
( 1 , 2 ) =>value2.1b
( 2 , 0 ) =>heading3
( 2 , 1 ) =>value1.1c
( 2 , 2 ) =>value2.1c
( 3 , 0 ) =>heading4
( 3 , 1 ) =>value1.1d
( 3 , 2... (15 Replies)
Discussion started by: tungaw2004
15 Replies
2. Shell Programming and Scripting
I have a binary file a particular format.
It contains the Length Bytes and the Type bytes i.e the first four bytes if the file indicate the length of the Type which is to follow.
for eg, if the int value of the first four bytes is 80, then it means that the length of the following "Type" is 80.... (2 Replies)
Discussion started by: xgringo
2 Replies
3. Shell Programming and Scripting
#! /usr/local/bin/perl -w
$ip = "$ARGV";
$rw = "$ARGV";
$snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw";
$snmpw = "/usr/local/bin/snmpwalk -Os -c $rw";
$syst=`$snmpg $ip system sysName sysObjectID`;
sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies
4. Shell Programming and Scripting
Hey guys,
I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.
The file is in the following format:
TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies
5. Shell Programming and Scripting
I have always struggled when parsing a file vertically vs. by line horizontally. Can't seem to get my head around the concept. Here again I need to convert vertical output to horizontal output.
original output
root@acuransx:bpplsched 2000-STAND3 -v -M acuransx -l
<2>bpplsched: INITIATING:... (4 Replies)
Discussion started by: jouuu
4 Replies
6. Shell Programming and Scripting
I trying to get only the highest version of the file names from an file which has list of file names.
EX:
CATEGORYDISPLAY JSP.A-SRC_BLD;2.4
CATEGORYDISPLAY JSP.A-SRC_BLD;2.5
CATEGORYDISPLAY JSP.A-SRC_BLD;2.1
CATEGORYDISPLAY JSP.A-SRC_BLD;2.2
The Script should display only the highest... (7 Replies)
Discussion started by: rocker_me2002
7 Replies
7. Shell Programming and Scripting
Hi All,
I have an input file something like this:
Line1
Line2
....
LineN
Identifier
( Field1a, Field1b;
Field2a, Field1b;
Field3a, Field1b;
.....
)
LineN+1
LineN+2
etc..
I basically need Field1a, Field2a, Field3a.... from the above file (6 Replies)
Discussion started by: tostay2003
6 Replies
8. Shell Programming and Scripting
All,
Can somebody provide me with some sed expertise on how to parse the
following line.
27-MAR-2011 10:28:01 * (CONNECT_DATA=(SID=dmart)(CID=(PROGRAM=sqlplus)(HOST=mtasnprod1)(USER=mtasnord))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.197.7.47)(PORT=54881)) * establish * dmart * 0
I would like... (3 Replies)
Discussion started by: BeefStu
3 Replies
9. Shell Programming and Scripting
Hi all,
Could anyone help me out?
My input file is like:
M1SYSNPENGGQINDNINYSE21PTMLPENLSLSNYDMDSFLGQFPSDNNMQLPHSTYEQHLQGEQQNPTNPNYFPPEFDEN2VDW1QE2
output is:
M1
M1SYSNPENGGQINDNINYSE2
M1SYSNPENGGQINDNINYSE21
SYSNPENGGQINDNINYSE2
SYSNPENGGQINDNINYSE21 ... (2 Replies)
Discussion started by: yinyuemi
2 Replies
10. Shell Programming and Scripting
Can some body show me a sed command to remove everyhing upto a '/' and
leave the rest of the line.
cat data.out
This is the directory /tmp/xxx/yy.ksh
I only want to get the fullpath name
/tmp/xxx.yy.ksh
Thanks in advance to all who answer. (3 Replies)
Discussion started by: BeefStu
3 Replies
LEARN ABOUT DEBIAN
xml::validate::libxml
XML::Validate::LibXML(3pm) User Contributed Perl Documentation XML::Validate::LibXML(3pm)
NAME
XML::Validate::LibXML - Interface to LibXML validator
SYNOPSIS
my $validator = new XML::Validate::LibXML(%options);
if ($doc = $validator->validate($xml)) {
... Do stuff with $doc ...
} else {
print "Document is invalid
";
}
DESCRIPTION
XML::Validate::LibXML is an interface to the LibXML validating parser which can be used with the XML::Validate module.
METHODS
new(%options)
Returns a new XML::Validate::LibXML instance using the specified options. (See OPTIONS below.)
validate($xml)
Returns a true value if $xml could be successfully parsed, undef otherwise.
Returns a true (XML::LibXML::Document) if $xml could be successfully parsed, undef otherwise.
last_dom()
Returns the DOM (XML::LibXML::Document) of the document last validated.
last_error()
Returns a hash ref containing the error from the last validate call. This backend currently only fills in the message field of hash.
Note that the error gets cleared at the beginning of each "validate" call.
version()
Returns the version of the XML::LibXML module that is installed
OPTIONS
XML::Validate::LibXML takes the following options:
strict_validation
If this boolean value is true, the document will be validated during parsing. Otherwise it will only be checked for well-formedness.
Defaults to true.
base_uri
Since the XML document is supplied as a string, the validator doesn't know the document's URI. If the document contains any components
referenced using relative URI's, you'll need to set this option to the document's URI so that the validator can retrieve them
correctly.
ERROR REPORTING
When a call to validate fails to parse the document, the error may be retrieved using last_error.
On errors not related to the XML parsing, these methods will throw exceptions. Wrap calls with eval to catch them.
DEPENDENCIES
XML::LibXML
BUGS
last_error currently returns a hash ref with only the message field filled. It would be nice to also fill the line and column fields.
VERSION
$Revision: 1.20 $ on $Date: 2005/09/06 11:05:08 $ by $Author: johna $
AUTHOR
Nathan Carr, Colin Robertson
<cpan _at_ bbc _dot_ co _dot_ uk>
COPYRIGHT
(c) BBC 2005. This program is free software; you can redistribute it and/or modify it under the GNU GPL. See the file COPYING in this
distribution, or http://www.gnu.org/licenses/gpl.txt
perl v5.10.1 2006-04-19 XML::Validate::LibXML(3pm)