10-18-2007
Hi.
Instead of using
curl or
wget, you might consider using
lynx, it removes the tags:
Quote:
-dump dumps the formatted output of the default document or one
specified on the command line to standard output. This can be
used in the following way:
lynx -dump
Lynx links
-- excerpt from
man lynx
or the
html2text utility at
html2text | mbayer.de
Best wishes ... cheers, drl
10 More Discussions You Might Find Interesting
1. UNIX Desktop Questions & Answers
Heleuw,
I want to convert an image to .xbm
the problem is that, wehen I convert it it is only a 2 color image,(black&white), someknowes a tool or an other solution to get the complete image to .xbm with colors and sizes etc:confused:
Thnx in advance
EJ =) (2 Replies)
Discussion started by: EJ =)
2 Replies
2. UNIX for Dummies Questions & Answers
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
3. UNIX for Dummies Questions & Answers
hi
i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies
4. Shell Programming and Scripting
I'm working with the output of an html form and trying to get it into CSV. The html is a table with many entries like the following.
<tr><td nowrap><b><font size=3>NAME</font></b></td><td nowrap><b>License # : </b> LICENSE</td></tr>
<tr><td><b>City : </b> CITY<td nowrap><b>Type :... (1 Reply)
Discussion started by: phip
1 Replies
5. Shell Programming and Scripting
I have a file I've already partially pruned with grep that has data like:
<a href="MasterDetailResults.asp?textfield=a&Application=3D Home Architect 4">3D Home Architect 4</a> </td>
Approved </td>
--
<a href="MasterDetailResults.asp?textfield=a&Application=3d Home... (6 Replies)
Discussion started by: macxcool
6 Replies
6. Shell Programming and Scripting
I am attempting to extract weather data from the following website, but for the Victoria area only:
Text Forecasts - Environment Canada
I use this:
sed -n "/Greater Victoria./,/Fraser Valley./p"
But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies
7. Shell Programming and Scripting
hi ,
i need to create a bash shell script that insert a text data file into an html made table, this table output has to mailed.I am new to shell scripting and have a very minimum idea of shell scripting.
please help. (9 Replies)
Discussion started by: intern123
9 Replies
8. Shell Programming and Scripting
Hi All,
There is a link from were I usually search somthing and fetch the data from.
Is there any way to automate it through a script if I mention search criteria in a note pad.
I mean the script to search the content on the notepad and resutls should be placed into another file.
... (2 Replies)
Discussion started by: indradev
2 Replies
9. Shell Programming and Scripting
Hi there, I'm quite new to the forum and shell scripting.
I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :(
<a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies
10. Shell Programming and Scripting
Hi everyone, I found this forum through a google search I'm hoping someone can help me. I am so clueless on coding stuff so bare with me.
I need to write a script/program to convert the snowfall data to a .CSV file. But I guess it doesn't end there.
I'm looking to grab snowfall totals and... (7 Replies)
Discussion started by: Cambium27
7 Replies
LEARN ABOUT DEBIAN
urlwatch
URLWATCH(1) User Commands URLWATCH(1)
NAME
urlwatch - Watch web pages and arbitrary URLs for changes
SYNOPSIS
urlwatch [options]
DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified diffs of the changes. You can filter always-changing parts of websites
by providing a "hooks.py" script.
OPTIONS
--version
show program's version number and exit
-h, --help
show the help message and exit
-v, --verbose
Show debug/log output
--urls=FILE
Read URLs from the specified file
--hooks=FILE
Use specified file as hooks.py module
-e, --display-errors
Include HTTP errors (404, etc..) in the output
ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by creating a hooks.py file that specifies for which URLs to use a spe-
cific feature. You can also use the hooks.py file to filter trivially-varying elements of a web page.
ICALENDAR FILE PARSING
This module allows you to parse .ics files that are in iCalendar format and provide a very simplified text-based format for the diffs. Use
it like this in your hooks.py file:
from urlwatch import ical2txt
def filter(url, data):
if url.endswith('.ics'):
return ical2txt.ical2text(data).encode('utf-8') + data
# ...you can add more hooks here...
HTML TO TEXT CONVERSION
There are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default), "html2text" and "re". The former
two use command-line utilities of the same name to convert HTML to text, and the last one uses a simple regex-based tag stripping method
(needs no extra tools). Here is an example of using it in your hooks.py file:
from urlwatch import html2txt
def filter(url, data):
if url.endswith('.html') or url.endswith('.htm'):
return html2txt.html2text(data, method='lynx')
# ...you can add more hooks here...
FILES
~/.urlwatch/urls.txt
A list of HTTP/FTP URLs to watch (one URL per line)
~/.urlwatch/lib/hooks.py
A Python module that can be used to filter contents
~/.urlwatch/cache/
The state of web pages is saved in this folder
AUTHOR
Thomas Perl <thp@thpinfo.com>
WEBSITE
http://thpinfo.com/2008/urlwatch/
urlwatch 1.11 July 2010 URLWATCH(1)