html parsing using unix Post: 302351128

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

HTML-CGI on Unix

AAAHHH!! I've made a perl program that you can run on a web browser. This program needs to be run everyday, and I don't want to have to run it everyday. The problem is when I try running the program from my terminal, all it does is print stuff to the terminal page (the program involves a lot of...

2. Shell Programming and Scripting

HTML parsing by PERL

i have a HTML report file..its in attachment(a part of the whole report is attached..name "input html.doc").also its source is attached in "report source code.txt" i just want to seperate the datas like in first line it should be.. NHTEST-3848498958-NHTEST-10.2-no-baloo a and so on for whole...

3. Shell Programming and Scripting

Parsing: How to go from HTML to CSV?

Dear all, I have to parse a large amount of html files, which I would like to transform into comma separated values. The html-files have the following structure: <tag1> CATEGORY_1 <tag2><tag3> HEADER_1 <tag4> <tag5> paragraph_1 <tag6> <tag5> paragraph_2 <tag6> <tag3>HEADER_2...

4. Shell Programming and Scripting

Html parsing - get line after specific string till a point

Hi all :) It sounds complex, for example I want to find the whole html file (there are 5 entries of this string and I need to get all of them) for the string "<td class="contentheading" width="100%">", get the next line from it only till the point that says "</td>", plus removing \t (tabs) ...

5. Shell Programming and Scripting

BASH parsing for html tags

Hello can anyone help me parse this line. <tr><td>United States of America</td><td>Dollar</td><td>43.309</td></tr><tr><td>Japan</td><td>Yen</td><td>0.5579</td></tr> the line above did not break. so i would like to have a result like this United States of America Dollar 43.309 Japan...

6. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem...

7. Shell Programming and Scripting

Perl syntax and html ole parsing

Hi gurus I am trying to understand some advanced (for me) perl constructions (syntax) following this tutorial I am trying to parse html: Using Mojo::DOM | Joel Berger say "div days:"; say $_->text for $dom->find('div.days')->each; say "\nspan hours:"; say $_->text for...

8. UNIX for Dummies Questions & Answers

HTML parsing with UNIX shell script

Hi there, Infra/LEXUS0157/lexus0157.html-<tr><td>Minimum password age</td><td>3 days</td><td>Win2k8 Server</td></tr> How do I extract from this html with unix, I just need the 1.'Minimum password age' & 2. '3 days' parameter. Tried doing so with python, would like to have a better...

9. Linux

Parsing - export html table data as .csv file?

Hi all, Is there any out there have a brilliant idea on how to export html table data as .csv or write to txt file with separated comma and also get the filename of link from every table and put one line per rows each table. Please see the attached html and PNG of what it looks like. ...

10. UNIX for Beginners Questions & Answers

Create html <ui> <li> by parsing text file

Hi you all, this is my first post in this forum. I'm italian (please forgive me) :-) so my english will fail to be correct... Anyway, let's get straight to the point! I have a text file like this: ,,,, Disney: 00961-002,,,, ,Pippo: 00531-002,,, ,,Pluto: 00238-002,, ...

LEARN ABOUT DEBIAN

mkdoc::xml

MKDoc::XML(3pm) 					User Contributed Perl Documentation					   MKDoc::XML(3pm)

NAME

       MKDoc::XML - The MKDoc XML Toolkit

SYNOPSIS

       This is an article, not a module.

SUMMARY

       MKDoc is a web content management system written in Perl which focuses on standards compliance, accessiblity and usability issues, and
       multi-lingual websites.

       At MKDoc Ltd we have decided to gradually break up our existing commercial software into a collection of completely independent, well-
       documented, well-tested open-source CPAN modules.

       Ultimately we want MKDoc code to be a coherent collection of module distributions, yet each distribution should be usable and useful in
       itself.

       MKDoc::XML is part of this effort.

       You could help us and turn some of MKDoc's code into a CPAN module.  You can take a look at the existing code at
       http://download.mkdoc.org/.

       If you are interested in some functionality which you would like to see as a standalone CPAN module, send an email to
       <mkdoc-modules@lists.webarch.co.uk>.

DISCLAIMER

       MKDoc::XML is a low level XML library.
       MKDoc::XML::* modules do not make sure your XML is well-formed.
       MKDoc::XML::* modules can be used to work with somehow broken XML.
       MKDoc::XML::* modules should not be used as high-level parsers with general purpose XML unless you know what you're doing.

WHAT'S IN THE BOX
   XML tokenizer
       MKDoc::XML::Tokenizer splits your XML / XHTML files into a list of MKDoc::XML::Token objects using a single regex.

   XML tree builder
       MKDoc::XML::TreeBuilder sits on top of MKDoc::XML::Tokenizer and builds parsed trees out of your XML / XHTML data.

   XML stripper
       MKDoc::XML::Stripper objects removes unwanted markup from your XML / HTML data. Useful to remove all those nasty presentational tags or
       'style' attributes from your XHTML data for example.

   XML tagger
       MKDoc::XML::Tagger module matches expressions in XML / XHTML documents and tag them appropriately. For example, you could automatically
       hyperlink certain glossary words or add <abbr> tags based on a dictionary of abbreviations and acronyms.

   XML entity decoder
       MKDoc::XML::Decode is a pluggable, configurable entity expander module which currently supports html entities, numerical entities and basic
       xml entities.

   XML entity encoder
       MKDoc::XML::Encode does the exact reverse operation as MKDoc::XML::Decode.

   XML Dumper
       MKDoc::XML::Dumper serializes arbitrarily complex perl structures into XML strings.  It is also able of doing the reverse operation, i.e.
       deserializing an XML string into a perl structure.

AUTHOR

       Copyright 2003 - MKDoc Holdings Ltd.

       Author: Jean-Michel Hiver

       This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.

SEE ALSO

	 Petal: http://search.cpan.org/dist/Petal/
	 MKDoc: http://www.mkdoc.com/

       Help us open-source MKDoc. Join the mkdoc-modules mailing list:

	 mkdoc-modules@lists.webarch.co.uk

perl v5.10.1							    2005-03-10							   MKDoc::XML(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

HTML-CGI on Unix

Discussion started by: sstevens

2. Shell Programming and Scripting

HTML parsing by PERL

Discussion started by: avik1983

3. Shell Programming and Scripting

Parsing: How to go from HTML to CSV?

Discussion started by: docdudetheman

4. Shell Programming and Scripting

Html parsing - get line after specific string till a point

Discussion started by: hakermania