Sponsored Content
Full Discussion: Wget -i URLs.txt problem
Top Forums UNIX for Dummies Questions & Answers Wget -i URLs.txt problem Post 302732625 by Corona688 on Sunday 18th of November 2012 07:42:46 PM
Old 11-18-2012
How to login depends on the website. You probably can't plug the list of files in directly, given there's likely some POST things needed before, but the file can probably at least be used to generate arguments for wget in a script.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting problem "sort -k 16,29 sample.txt > output.txt"

Hi all, Iam trying to sort the contents of the file based on the position of the file. Example: $cat sample.txt 0101020060731 ## Header record 1c1 Berger Awc ANP20070201301 4000.50 1c2 Bose W G ANP20070201609 6000.70 1c2 Andy CK ANP20070201230 28000.00... (3 Replies)
Discussion started by: ganapati
3 Replies

2. UNIX for Advanced & Expert Users

Wget FTP problem!

Hi, I've tried to download from ftp sites by wget but it failed and says "Service unavailable" but when I use sftp in binary mode and use "get" command it works perfectly. What's the problem? BTW: I tried both passive and active mode in wget. thnx for ur help (9 Replies)
Discussion started by: mjdousti
9 Replies

3. Shell Programming and Scripting

Problem with wget

Hi, I want to download some patches from SUN by using a script and I am using "wget" as the utillity for this. The website for downloading has a "https:" in its name as below https://sunsolve.sun.com/private-cgi/pdownload.pl?target=${line}&method=h and on running wget as below wget... (1 Reply)
Discussion started by: max29583
1 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

Problem with wget no check certificate.

Hi, I'm trying to install some libraries, when running the makefile I get an error from the "wget --no check certificate option". I had a look help and the option wasn't listed. Anyone know what I'm missing. (0 Replies)
Discussion started by: davcra
0 Replies

6. UNIX for Dummies Questions & Answers

find lines in file1.txt not found in file2.txt memory problem

I have a diff command that does what I want but when comparing large text/log files, it uses up all the memory I have (sometimes over 8gig of memory) diff file1.txt file2.txt | grep '^<'| awk '{$1="";print $0}' | sed 's/^ *//' Is there a better more efficient way to find the lines in one file... (5 Replies)
Discussion started by: raptor25
5 Replies

7. Shell Programming and Scripting

Problem with wget and cookie

Dear people, I got a problem with an scrip using wget to download pdf-files from an website which uses session-cookies. Background: for university its quite nasty to look up weekly which new homeworks, papers etc. are available on the different sites of the universites chairs. So I wanted a... (1 Reply)
Discussion started by: jackomo
1 Replies

8. Shell Programming and Scripting

Download pdf's using wget convert to txt

wget -i genedx.txt The code above will download multiple pdf files from a site, but how can i download and convert these to .txt? I have attached the master list (genedx.txt - which contains the url and file names) as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies

9. Proxy Server

Problem with wget

I cannot download anything using wget in centos 6.5 and 7. But I can update yum etc. # wget https://wordpress.org/latest.tar.gz --2014-10-23 13:50:23-- https://wordpress.org/latest.tar.gz Resolving wordpress.org... 66.155.40.249, 66.155.40.250 Connecting to wordpress.org|66.155.40.249|:443...... (3 Replies)
Discussion started by: nirosha
3 Replies
URLWATCH(1)							   User Commands						       URLWATCH(1)

NAME
urlwatch - Watch web pages and arbitrary URLs for changes SYNOPSIS
urlwatch [options] DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified diffs of the changes. You can filter always-changing parts of websites by providing a "hooks.py" script. OPTIONS
--version show program's version number and exit -h, --help show the help message and exit -v, --verbose Show debug/log output --urls=FILE Read URLs from the specified file --hooks=FILE Use specified file as hooks.py module -e, --display-errors Include HTTP errors (404, etc..) in the output ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by creating a hooks.py file that specifies for which URLs to use a spe- cific feature. You can also use the hooks.py file to filter trivially-varying elements of a web page. ICALENDAR FILE PARSING This module allows you to parse .ics files that are in iCalendar format and provide a very simplified text-based format for the diffs. Use it like this in your hooks.py file: from urlwatch import ical2txt def filter(url, data): if url.endswith('.ics'): return ical2txt.ical2text(data).encode('utf-8') + data # ...you can add more hooks here... HTML TO TEXT CONVERSION There are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default), "html2text" and "re". The former two use command-line utilities of the same name to convert HTML to text, and the last one uses a simple regex-based tag stripping method (needs no extra tools). Here is an example of using it in your hooks.py file: from urlwatch import html2txt def filter(url, data): if url.endswith('.html') or url.endswith('.htm'): return html2txt.html2text(data, method='lynx') # ...you can add more hooks here... FILES
~/.urlwatch/urls.txt A list of HTTP/FTP URLs to watch (one URL per line) ~/.urlwatch/lib/hooks.py A Python module that can be used to filter contents ~/.urlwatch/cache/ The state of web pages is saved in this folder AUTHOR
Thomas Perl <thp@thpinfo.com> WEBSITE
http://thpinfo.com/2008/urlwatch/ urlwatch 1.11 July 2010 URLWATCH(1)
All times are GMT -4. The time now is 04:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy