Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Remove untagged and junk data from an XML Post 302913373 by Corona688 on Friday 15th of August 2014 05:10:10 PM
Old 08-15-2014
A modification of the smallest XML parser I have:

Code:
BEGIN {
        FS=">"
        OFS=">"
        RS="<"
}

NR==1 { next } # The first "line" is blank when RS=<

/^[!?]/ { printf("%s", RS $0 ); next    }   # print XML specification junk

# Handle open-tags
match($0, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($0), RSTART, RLENGTH);
        TAGS=TAG "%" TAGS;
}

# Handle close-tags
/^[\/]/ {
        sub(/^\//, "", $1);
        sub("^.*" toupper($1) "%", "", TAGS);
        $1="/"$1
        if(length(TAGS) == 0) # Strip out out-of-xml cdata
        {
                NF=2
                $2="\n"
        }
}

{       printf("<%s", $0);      } # Print everything

Code:
$ awk -f xmlclean.awk data

<?xml version="1.0" encoding="IBM037"?><spd><timestamp>07-04-2014 00:15:04</timestamp></spd>
<?xml version="1.0" encoding="IBM037"?><spd><timestamp>07-04-2014 00:15:04</timestamp></spd>
<?xml version="1.0" encoding="IBM037"?><spd><timestamp>07-04-2014 00:15:04</timestamp></spd>

$

This User Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to remove junk (^Ò) character while FTPing

Hi All, I have been trying to FTP some data files from Windows directory to a UNIX server. The txt file in the windows contails the following data: "111~XYZ~1~Contact person’s phone number~COMMENTS~~~~" but the same line is appearing as "111~XYZ~1~Contact person^Òs phone number~COMMENTS~~~~"... (8 Replies)
Discussion started by: vkumbhakarna
8 Replies

2. Shell Programming and Scripting

is is possible remove junk chars from the strings?

Hello friendz, dfl;g435hkd.fg ..this is what I am getting. I want to print strings without junk chars. I want to exactly like this... dflg435hkd.fg ...need some specific operators also. for example, dot or comma should allow. plz help me out ya! ~Balan:confused: (1 Reply)
Discussion started by: balan_mca
1 Replies

3. Shell Programming and Scripting

Remove all JUNK character from file.

Hi Team, I have a file having size greater than 1 GB. What i want to do is to check if it contains any JUNK character (ie any special charater thats not on the key board stroke). This file has 532 column & seperated with ^~^. I have found some solution from the file, but it is for a... (4 Replies)
Discussion started by: Amit.Sagpariya
4 Replies

4. Shell Programming and Scripting

Remove junk characters using Perl

Guys, can you help me in removing the junk character "^S" from the below line using perl Reference Data Not Recognised ^S Where a value is provided by the consuming system, which is not reco Thanks, M.Mohan (1 Reply)
Discussion started by: mohan_xunil
1 Replies

5. UNIX for Dummies Questions & Answers

XML file shows Junk Characters in UNIX

Hello sir, I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix. Can any help me with this problem. Thank you in advance. Hema (6 Replies)
Discussion started by: hemavenkatesh
6 Replies

6. UNIX for Dummies Questions & Answers

How to remove JUNK characters (FROM�)

Hi I have to remove the junk characters from my file. Please help.. File content : CURITY_CODE_GSD) FROM� DL_CB_SOD_EOD_VALUATION WHERE� ASOF (1 Reply)
Discussion started by: arukuku
1 Replies

7. UNIX Desktop Questions & Answers

Help me to remove junk char

I wanted to remove junk char in my csv. :mad: Input file format: "17","9986782190","0","D","2" "17","9900918331","0","D","2" "13","9986782194","0","A","2" Output file format 9986782190 9900918331 9986782194 And one more thing all the time "13"," this will be different Ex: . (2 Replies)
Discussion started by: Siddartha
2 Replies

8. Shell Programming and Scripting

Remove all junk characters from a text file

I am using flatfile, in that flat file we are getting the junk chars 1)I21001f<82>^Me<85>!h49 Service Charge 2) I21001f‚ e...!h49 Service Charge please tell me how to remove all junk chars in unix scripts. (1 Reply)
Discussion started by: Talari
1 Replies

9. UNIX for Dummies Questions & Answers

How to replace and remove few junk characters from a specific field?

I would like to remove all characters starting with "%" and ending with ")" in the 4th field - please help!! 1412007819.864 /device/services/heartbeatxx 204 0.547%!i(int=0) 0.434 0.112 1412007819.866 /device/services/heartbeatxx 204 0.547%!i(int=1) 0.423 0.123... (10 Replies)
Discussion started by: snemuk14
10 Replies

10. UNIX for Beginners Questions & Answers

Need to remove Junk characters

Hi All, I have a issue that we are getting Junk characters from source and i am not able to load that records to Database. Line breakers Junk Characters (Â and different every time) Japanese Characters Every time I am using grep command and awk -F "\007" to find them and delete that... (1 Reply)
Discussion started by: spradeep86
1 Replies
SAX(3)							User Contributed Perl Documentation						    SAX(3)

NAME
XML::SAX - Simple API for XML SYNOPSIS
use XML::SAX; # get a list of known parsers my $parsers = XML::SAX->parsers(); # add/update a parser XML::SAX->add_parser(q(XML::SAX::PurePerl)); # remove parser XML::SAX->remove_parser(q(XML::SAX::Foodelberry)); # save parsers XML::SAX->save_parsers(); DESCRIPTION
XML::SAX is a SAX parser access API for Perl. It includes classes and APIs required for implementing SAX drivers, along with a factory class for returning any SAX parser installed on the user's system. USING A SAX2 PARSER The factory class is XML::SAX::ParserFactory. Please see the documentation of that module for how to instantiate a SAX parser: XML::SAX::ParserFactory. However if you don't want to load up another manual page, here's a short synopsis: use XML::SAX::ParserFactory; use XML::SAX::XYZHandler; my $handler = XML::SAX::XYZHandler->new(); my $p = XML::SAX::ParserFactory->parser(Handler => $handler); $p->parse_uri("foo.xml"); # or $p->parse_string("<foo/>") or $p->parse_file($fh); This will automatically load a SAX2 parser (defaulting to XML::SAX::PurePerl if no others are found) and return it to you. In order to learn how to use SAX to parse XML, you will need to read XML::SAX::Intro and for reference, XML::SAX::Specification. WRITING A SAX2 PARSER The first thing to remember in writing a SAX2 parser is to subclass XML::SAX::Base. This will make your life infinitely easier, by providing a number of methods automagically for you. See XML::SAX::Base for more details. When writing a SAX2 parser that is compatible with XML::SAX, you need to inform XML::SAX of the presence of that driver when you install it. In order to do that, XML::SAX contains methods for saving the fact that the parser exists on your system to a "INI" file, which is then loaded to determine which parsers are installed. The best way to do this is to follow these rules: o Add XML::SAX as a prerequisite in Makefile.PL: WriteMakefile( ... PREREQ_PM => { 'XML::SAX' => 0 }, ... ); Alternatively you may wish to check for it in other ways that will cause more than just a warning. o Add the following code snippet to your Makefile.PL: sub MY::install { package MY; my $script = shift->SUPER::install(@_); if (ExtUtils::MakeMaker::prompt( "Do you want to modify ParserDetails.ini?", 'Y') =~ /^y/i) { $script =~ s/install :: (.*)$/install :: $1 install_sax_driver/m; $script .= <<"INSTALL"; install_sax_driver : @$(PERL) -MXML::SAX -e "XML::SAX->add_parser(q($(NAME)))->save_parsers()" INSTALL } return $script; } Note that you should check the output of this - $(NAME) will use the name of your distribution, which may not be exactly what you want. For example XML::LibXML has a driver called XML::LibXML::SAX::Generator, which is used in place of $(NAME) in the above. o Add an XML::SAX test: A test file should be added to your t/ directory containing something like the following: use Test; BEGIN { plan tests => 3 } use XML::SAX; use XML::SAX::PurePerl::DebugHandler; XML::SAX->add_parser(q(XML::SAX::MyDriver)); local $XML::SAX::ParserPackage = 'XML::SAX::MyDriver'; eval { my $handler = XML::SAX::PurePerl::DebugHandler->new(); ok($handler); my $parser = XML::SAX::ParserFactory->parser(Handler => $handler); ok($parser); ok($parser->isa('XML::SAX::MyDriver'); $parser->parse_string("<tag/>"); ok($handler->{seen}{start_element}); }; EXPORTS
By default, XML::SAX exports nothing into the caller's namespace. However you can request the symbols "Namespaces" and "Validation" which are the URIs for those features, allowing an easier way to request those features via ParserFactory: use XML::SAX qw(Namespaces Validation); my $factory = XML::SAX::ParserFactory->new(); $factory->require_feature(Namespaces); $factory->require_feature(Validation); my $parser = $factory->parser(); AUTHOR
Current maintainer: Grant McLean, grantm@cpan.org Originally written by: Matt Sergeant, matt@sergeant.org Kip Hampton, khampton@totalcinema.com Robin Berjon, robin@knowscape.com LICENSE
This is free software, you may use it and distribute it under the same terms as Perl itself. SEE ALSO
XML::SAX::Base for writing SAX Filters and Parsers XML::SAX::PurePerl for an XML parser written in 100% pure perl. XML::SAX::Exception for details on exception handling perl v5.16.2 2011-09-04 SAX(3)
All times are GMT -4. The time now is 08:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy