debian man page for urlwatch

Query: urlwatch

OS: debian

Section: 1

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

URLWATCH(1)							   User Commands						       URLWATCH(1)

NAME
urlwatch - Watch web pages and arbitrary URLs for changes
SYNOPSIS
urlwatch [options]
DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified diffs of the changes. You can filter always-changing parts of websites by providing a "hooks.py" script.
OPTIONS
--version show program's version number and exit -h, --help show the help message and exit -v, --verbose Show debug/log output --urls=FILE Read URLs from the specified file --hooks=FILE Use specified file as hooks.py module -e, --display-errors Include HTTP errors (404, etc..) in the output
ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by creating a hooks.py file that specifies for which URLs to use a spe- cific feature. You can also use the hooks.py file to filter trivially-varying elements of a web page. ICALENDAR FILE PARSING This module allows you to parse .ics files that are in iCalendar format and provide a very simplified text-based format for the diffs. Use it like this in your hooks.py file: from urlwatch import ical2txt def filter(url, data): if url.endswith('.ics'): return ical2txt.ical2text(data).encode('utf-8') + data # ...you can add more hooks here... HTML TO TEXT CONVERSION There are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default), "html2text" and "re". The former two use command-line utilities of the same name to convert HTML to text, and the last one uses a simple regex-based tag stripping method (needs no extra tools). Here is an example of using it in your hooks.py file: from urlwatch import html2txt def filter(url, data): if url.endswith('.html') or url.endswith('.htm'): return html2txt.html2text(data, method='lynx') # ...you can add more hooks here...
FILES
~/.urlwatch/urls.txt A list of HTTP/FTP URLs to watch (one URL per line) ~/.urlwatch/lib/hooks.py A Python module that can be used to filter contents ~/.urlwatch/cache/ The state of web pages is saved in this folder
AUTHOR
Thomas Perl <thp@thpinfo.com>
WEBSITE
http://thpinfo.com/2008/urlwatch/ urlwatch 1.11 July 2010 URLWATCH(1)
Related Man Pages
html::linkextor(3) - redhat
html::linkextor(3) - mojave
html::linkextor(3) - suse
template::plugin::url(3) - suse
html::linkextor(3pm) - debian
Similar Topics in the Unix Linux Community
Selecting information from several web pages...
hwo 2 send information from unix 2 URLs
filename in the script as input
Wget -i URLs.txt problem
Removing VBSEO for vbulletin – Reverting back to vbulletin URLs