11-18-2012
How to login depends on the website. You probably can't plug the list of files in directly, given there's likely some POST things needed before, but the file can probably at least be used to generate arguments for wget in a script.
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi all,
Iam trying to sort the contents of the file based on the position of the file.
Example:
$cat sample.txt
0101020060731 ## Header record
1c1 Berger Awc ANP20070201301 4000.50
1c2 Bose W G ANP20070201609 6000.70
1c2 Andy CK ANP20070201230 28000.00... (3 Replies)
Discussion started by: ganapati
3 Replies
2. UNIX for Advanced & Expert Users
Hi,
I've tried to download from ftp sites by wget but it failed and says "Service unavailable" but when I use sftp in binary mode and use "get" command it works perfectly. What's the problem?
BTW: I tried both passive and active mode in wget.
thnx for ur help (9 Replies)
Discussion started by: mjdousti
9 Replies
3. Shell Programming and Scripting
Hi,
I want to download some patches from SUN by using a script and I am using "wget" as the utillity for this.
The website for downloading has a "https:" in its name as below
https://sunsolve.sun.com/private-cgi/pdownload.pl?target=${line}&method=h
and on running wget as below
wget... (1 Reply)
Discussion started by: max29583
1 Replies
4. Shell Programming and Scripting
Hi,
I need to basically get a list of all the tarballs located at uri
I am currently doing a wget on urito get the index.html page
Now this index page contains the list of uris that I want to use in my bash script.
can someone please guide me ,.
I am new to Linux and shell scripting.
... (5 Replies)
Discussion started by: mnanavati
5 Replies
5. UNIX for Dummies Questions & Answers
Hi, I'm trying to install some libraries, when running the makefile I get an error from the "wget --no check certificate option". I had a look help and the option wasn't listed. Anyone know what I'm missing. (0 Replies)
Discussion started by: davcra
0 Replies
6. UNIX for Dummies Questions & Answers
I have a diff command that does what I want but when comparing large text/log files, it uses up all the memory I have (sometimes over 8gig of memory)
diff file1.txt file2.txt | grep '^<'| awk '{$1="";print $0}' | sed 's/^ *//'
Is there a better more efficient way to find the lines in one file... (5 Replies)
Discussion started by: raptor25
5 Replies
7. Shell Programming and Scripting
Dear people, I got a problem with an scrip using wget to download pdf-files from an website which uses session-cookies.
Background: for university its quite nasty to look up weekly which new homeworks, papers etc. are available on the different sites of the universites chairs. So I wanted a... (1 Reply)
Discussion started by: jackomo
1 Replies
8. Shell Programming and Scripting
wget -i genedx.txt
The code above will download multiple pdf files from a site, but how can i download and convert these to .txt?
I have attached the master list (genedx.txt - which contains the url and file names)
as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies
9. Proxy Server
I cannot download anything using wget in centos 6.5 and 7. But I can update yum etc.
# wget https://wordpress.org/latest.tar.gz
--2014-10-23 13:50:23-- https://wordpress.org/latest.tar.gz
Resolving wordpress.org... 66.155.40.249, 66.155.40.250
Connecting to wordpress.org|66.155.40.249|:443...... (3 Replies)
Discussion started by: nirosha
3 Replies
LEARN ABOUT DEBIAN
urlwatch
URLWATCH(1) User Commands URLWATCH(1)
NAME
urlwatch - Watch web pages and arbitrary URLs for changes
SYNOPSIS
urlwatch [options]
DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified diffs of the changes. You can filter always-changing parts of websites
by providing a "hooks.py" script.
OPTIONS
--version
show program's version number and exit
-h, --help
show the help message and exit
-v, --verbose
Show debug/log output
--urls=FILE
Read URLs from the specified file
--hooks=FILE
Use specified file as hooks.py module
-e, --display-errors
Include HTTP errors (404, etc..) in the output
ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by creating a hooks.py file that specifies for which URLs to use a spe-
cific feature. You can also use the hooks.py file to filter trivially-varying elements of a web page.
ICALENDAR FILE PARSING
This module allows you to parse .ics files that are in iCalendar format and provide a very simplified text-based format for the diffs. Use
it like this in your hooks.py file:
from urlwatch import ical2txt
def filter(url, data):
if url.endswith('.ics'):
return ical2txt.ical2text(data).encode('utf-8') + data
# ...you can add more hooks here...
HTML TO TEXT CONVERSION
There are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default), "html2text" and "re". The former
two use command-line utilities of the same name to convert HTML to text, and the last one uses a simple regex-based tag stripping method
(needs no extra tools). Here is an example of using it in your hooks.py file:
from urlwatch import html2txt
def filter(url, data):
if url.endswith('.html') or url.endswith('.htm'):
return html2txt.html2text(data, method='lynx')
# ...you can add more hooks here...
FILES
~/.urlwatch/urls.txt
A list of HTTP/FTP URLs to watch (one URL per line)
~/.urlwatch/lib/hooks.py
A Python module that can be used to filter contents
~/.urlwatch/cache/
The state of web pages is saved in this folder
AUTHOR
Thomas Perl <thp@thpinfo.com>
WEBSITE
http://thpinfo.com/2008/urlwatch/
urlwatch 1.11 July 2010 URLWATCH(1)