10-26-2011
Hey Click
Thanks for your reply
I am not trying to convert a HTML page into Text, But tring to eliminate the HTML tags in text column.
Regards
Chetan
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi all-
I have a variable that contains a web page:
echo $STUFF
<html> <head> <title>my page</title></head> <body> blah blah etc..
Can I use the shell's parameter expansion abilities to remove just the tags?
I thought that FIXHTML=${STUFF//<*>/} might do it, but it didn't seem to... (2 Replies)
Discussion started by: rev66
2 Replies
2. Shell Programming and Scripting
I generally save a lot of web pages for reading offline which works out great for school. Now I have to spend a lot of time on the bus and I am looking for the best way to read some of these webpages using my Nokia 7610.
I have uploaded the files to my phone, but they are deadly deadly slow to... (2 Replies)
Discussion started by: naphelge
2 Replies
3. Shell Programming and Scripting
Hello i am trying to remove the html format from the file using sed. for example remove <p> </p>
i tried to do this : sed -e 's/<*>//g' test > test.t
but still i have some html format . please help if you have any suggestions
lets say this is the html file
1... (11 Replies)
Discussion started by: koricha
11 Replies
4. Shell Programming and Scripting
Hi there, I'm quite new to the forum and shell scripting.
I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :(
<a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies
5. Shell Programming and Scripting
Could someone, please provide a solution to the following:
I would like to remove some tags from the "head" of multiple html documents across the web site. They look like
<link rel="alternate" type="application/rss+xml"
title="Business and Investment in the Philippines"... (2 Replies)
Discussion started by: georgi58
2 Replies
6. Shell Programming and Scripting
I store different variance of the below in an xml file. and apparently, xml has an issue loading up data like this because it contains html tags. i would like to preserve this data as it is, but unfortunately, xml says i cant.
so i have to strip out all the html tags.
the examples i found... (9 Replies)
Discussion started by: SkySmart
9 Replies
7. Shell Programming and Scripting
I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags.
The logic of the script would be:
- if there is <li> or <ul> on the line, do nothing (=write same line to output)
- if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies
8. Homework & Coursework Questions
Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!
1. The problem statement, all variables and given/known data:
You will write a script that will remove all HTML tags from an HTML document and remove any consecutive... (3 Replies)
Discussion started by: tburns517
3 Replies
9. UNIX for Beginners Questions & Answers
I want to use the tooltip in html, however the tranparency is creating problem for detailed tooltips as the text from the back interferes with the readability of the tooltip text.
I have done the following changes, however the normal tooltip es still transparent
I call it using
<a... (3 Replies)
Discussion started by: kristinu
3 Replies
LEARN ABOUT OSX
html::filter5.16
HTML::Filter(3) User Contributed Perl Documentation HTML::Filter(3)
NAME
HTML::Filter - Filter HTML text through the parser
NOTE
This module is deprecated. The "HTML::Parser" now provides the functionally of "HTML::Filter" much more efficiently with the the "default"
handler.
SYNOPSIS
require HTML::Filter;
$p = HTML::Filter->new->parse_file("index.html");
DESCRIPTION
"HTML::Filter" is an HTML parser that by default prints the original text of each HTML element (a slow version of cat(1) basically). The
callback methods may be overridden to modify the filtering for some HTML elements and you can override output() method which is called to
print the HTML text.
"HTML::Filter" is a subclass of "HTML::Parser". This means that the document should be given to the parser by calling the $p->parse() or
$p->parse_file() methods.
EXAMPLES
The first example is a filter that will remove all comments from an HTML file. This is achieved by simply overriding the comment method to
do nothing.
package CommentStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub comment { } # ignore comments
The second example shows a filter that will remove any <TABLE>s found in the HTML file. We specialize the start() and end() methods to
count table tags and then make output not happen when inside a table.
package TableStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub start
{
my $self = shift;
$self->{table_seen}++ if $_[0] eq "table";
$self->SUPER::start(@_);
}
sub end
{
my $self = shift;
$self->SUPER::end(@_);
$self->{table_seen}-- if $_[0] eq "table";
}
sub output
{
my $self = shift;
unless ($self->{table_seen}) {
$self->SUPER::output(@_);
}
}
If you want to collect the parsed text internally you might want to do something like this:
package FilterIntoString;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
sub filtered_html { join("", @{$_[0]->{fhtml}}) }
SEE ALSO
HTML::Parser
COPYRIGHT
Copyright 1997-1999 Gisle Aas.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.16.2 2008-04-04 HTML::Filter(3)