09-15-2016
We'll need to see the HTML, not just the bit you want.
This User Gave Thanks to Corona688 For This Post:
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
2. Shell Programming and Scripting
Hi,
I have a text file say file1 having data like
ABC c:/hm/new1 Dir
DEF d:/ner/d sd
......
So i want to make a table from this text file, is it possible to do it using perl.
Thanks in advance
Sarbjit (1 Reply)
Discussion started by: sarbjit
1 Replies
3. Shell Programming and Scripting
I am attempting to extract weather data from the following website, but for the Victoria area only:
Text Forecasts - Environment Canada
I use this:
sed -n "/Greater Victoria./,/Fraser Valley./p"
But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies
4. Shell Programming and Scripting
Hello everyone, I'm new to this forum and i am new as a shell scripter.
my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines
Here's my situation
<td align="default"> oxidizability (mg / l):
data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies
5. Shell Programming and Scripting
Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through.
https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html
Is a similar problem. The only... (5 Replies)
Discussion started by: counfhou
5 Replies
6. Shell Programming and Scripting
Hi Folks,
Could you please share your ideas on extracting text from image file(jpg,png and gif formats).
Regards,
J (1 Reply)
Discussion started by: scriptscript
1 Replies
7. Shell Programming and Scripting
awk/sed newbie here. I have a HTML file and from that file and I would like to retrieve a text word.
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc_process.txt>abc</a> NDK Version: 4.0 </li>
<font face=arial size=-1><li><a... (6 Replies)
Discussion started by: sk2code
6 Replies
8. Shell Programming and Scripting
Hello All,
I am using awk with html options to format and send output to another file.
Below command works fine, no issues.
awk 'BEGIN{print "<table border="1" width="1000" >"} {print "<tr>";for(i=1;i<=NF;i++)print "<td>" $i"</td>";print "</tr>"} END
{print "</table>"}' ${TMPLOGFILE1} >>... (0 Replies)
Discussion started by: jvmani_1
0 Replies
9. Shell Programming and Scripting
I'm extracting text between table tags in HTML
<th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th>
using this:
awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3
then this (text between a href):
sed -e 's/\(<*>\)//g' auto3 > auto4
How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies
10. UNIX for Beginners Questions & Answers
Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you :).
file
<html><head><title>xxxxxx xxxxx</title><style type="text/css">
... (4 Replies)
Discussion started by: cmccabe
4 Replies
LEARN ABOUT MOJAVE
html::filter
HTML::Filter(3) User Contributed Perl Documentation HTML::Filter(3)
NAME
HTML::Filter - Filter HTML text through the parser
NOTE
This module is deprecated. The "HTML::Parser" now provides the functionally of "HTML::Filter" much more efficiently with the the "default"
handler.
SYNOPSIS
require HTML::Filter;
$p = HTML::Filter->new->parse_file("index.html");
DESCRIPTION
"HTML::Filter" is an HTML parser that by default prints the original text of each HTML element (a slow version of cat(1) basically). The
callback methods may be overridden to modify the filtering for some HTML elements and you can override output() method which is called to
print the HTML text.
"HTML::Filter" is a subclass of "HTML::Parser". This means that the document should be given to the parser by calling the $p->parse() or
$p->parse_file() methods.
EXAMPLES
The first example is a filter that will remove all comments from an HTML file. This is achieved by simply overriding the comment method to
do nothing.
package CommentStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub comment { } # ignore comments
The second example shows a filter that will remove any <TABLE>s found in the HTML file. We specialize the start() and end() methods to
count table tags and then make output not happen when inside a table.
package TableStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub start
{
my $self = shift;
$self->{table_seen}++ if $_[0] eq "table";
$self->SUPER::start(@_);
}
sub end
{
my $self = shift;
$self->SUPER::end(@_);
$self->{table_seen}-- if $_[0] eq "table";
}
sub output
{
my $self = shift;
unless ($self->{table_seen}) {
$self->SUPER::output(@_);
}
}
If you want to collect the parsed text internally you might want to do something like this:
package FilterIntoString;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
sub filtered_html { join("", @{$_[0]->{fhtml}}) }
SEE ALSO
HTML::Parser
COPYRIGHT
Copyright 1997-1999 Gisle Aas.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.18.2 2013-03-25 HTML::Filter(3)