Sponsored Content
Full Discussion: Wget -i URLs.txt problem
Top Forums UNIX for Dummies Questions & Answers Wget -i URLs.txt problem Post 302732629 by Keith londrie on Sunday 18th of November 2012 08:58:31 PM
Old 11-18-2012
RE: wget -i URLs.txt

Hi Corona688,

Thanks for your post. The membership site I belong to is resell-rights-weekly.com and I just login and click the links to download to my home computer. I want to bypass my home computer and copy the files for that week's downloads. server to server is much faster than me trying to DSL them down and back up to my server. The input file is necessary because new downloads are put on the site each week. I will then put the urls in URLs.txt before the wget, set up as a cron to run every Monday and bring the files over in a fraction of the time to copy. I had it working partially but could not remember the switches I set.

Here is my next try: -->> wget -i URLs.txt --post-data 'user=klondrie&password=XXXX' -o wgetlogfile.txt -c

What do you think? What would you change? This should be a piece of cake. I do not see a lot of security as I can login and click the links to download to my computer. Need them on my server though.

Any more help available?
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting problem "sort -k 16,29 sample.txt > output.txt"

Hi all, Iam trying to sort the contents of the file based on the position of the file. Example: $cat sample.txt 0101020060731 ## Header record 1c1 Berger Awc ANP20070201301 4000.50 1c2 Bose W G ANP20070201609 6000.70 1c2 Andy CK ANP20070201230 28000.00... (3 Replies)
Discussion started by: ganapati
3 Replies

2. UNIX for Advanced & Expert Users

Wget FTP problem!

Hi, I've tried to download from ftp sites by wget but it failed and says "Service unavailable" but when I use sftp in binary mode and use "get" command it works perfectly. What's the problem? BTW: I tried both passive and active mode in wget. thnx for ur help (9 Replies)
Discussion started by: mjdousti
9 Replies

3. Shell Programming and Scripting

Problem with wget

Hi, I want to download some patches from SUN by using a script and I am using "wget" as the utillity for this. The website for downloading has a "https:" in its name as below https://sunsolve.sun.com/private-cgi/pdownload.pl?target=${line}&method=h and on running wget as below wget... (1 Reply)
Discussion started by: max29583
1 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

Problem with wget no check certificate.

Hi, I'm trying to install some libraries, when running the makefile I get an error from the "wget --no check certificate option". I had a look help and the option wasn't listed. Anyone know what I'm missing. (0 Replies)
Discussion started by: davcra
0 Replies

6. UNIX for Dummies Questions & Answers

find lines in file1.txt not found in file2.txt memory problem

I have a diff command that does what I want but when comparing large text/log files, it uses up all the memory I have (sometimes over 8gig of memory) diff file1.txt file2.txt | grep '^<'| awk '{$1="";print $0}' | sed 's/^ *//' Is there a better more efficient way to find the lines in one file... (5 Replies)
Discussion started by: raptor25
5 Replies

7. Shell Programming and Scripting

Problem with wget and cookie

Dear people, I got a problem with an scrip using wget to download pdf-files from an website which uses session-cookies. Background: for university its quite nasty to look up weekly which new homeworks, papers etc. are available on the different sites of the universites chairs. So I wanted a... (1 Reply)
Discussion started by: jackomo
1 Replies

8. Shell Programming and Scripting

Download pdf's using wget convert to txt

wget -i genedx.txt The code above will download multiple pdf files from a site, but how can i download and convert these to .txt? I have attached the master list (genedx.txt - which contains the url and file names) as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies

9. Proxy Server

Problem with wget

I cannot download anything using wget in centos 6.5 and 7. But I can update yum etc. # wget https://wordpress.org/latest.tar.gz --2014-10-23 13:50:23-- https://wordpress.org/latest.tar.gz Resolving wordpress.org... 66.155.40.249, 66.155.40.250 Connecting to wordpress.org|66.155.40.249|:443...... (3 Replies)
Discussion started by: nirosha
3 Replies
wwwoffle(1)						      General Commands Manual						       wwwoffle(1)

NAME
wwwoffle - A program to control the World Wide Web Offline Explorer. SYNOPSIS
wwwoffle -h | --help --version wwwoffle -online [-p <host>[:<port>] | -c <config-file>] wwwoffle -autodial [-p <host>[:<port>] | -c <config-file>] wwwoffle -offline [-p <host>[:<port>] | -c <config-file>] wwwoffle -fetch [-p <host>[:<port>] | -c <config-file>] wwwoffle -config [-p <host>[:<port>] | -c <config-file>] wwwoffle -dump [-p <host>[:<port>] | -c <config-file>] wwwoffle -cyclelog [-p <host>[:<port>] | -c <config-file>] wwwoffle -purge [-p <host>[:<port>] | -c <config-file>] wwwoffle -status [-p <host>[:<port>] | -c <config-file>] wwwoffle -kill [-p <host>[:<port>] | -c <config-file>] wwwoffle [-o|-O] [-p <host>[:<port>] | -c <config-file>] URL wwwoffle [-d[<depth>] | -r[<depth>] | -R[<depth>]] [-g[Sisfo]] [-F] [-p <host>[:<port>] | -c <config-file>] URL1 [URL2 [... URL]] wwwoffle [-d[<depth>] | -r[<depth>] | -R[<depth>]] [-g[Sisfo]] [-F] [-p <host>[:<port>] | -c <config-file>] file1 [file2 [... file]] wwwoffle -post [-p <host>[:<port>] | -c <config-file>] URL wwwoffle -put [-p <host>[:<port>] | -c <config-file>] URL DESCRIPTION
wwwoffle controls the World Wide Web Offline Explorer proxy HTTP server. The wwwoffle program is used to control the wwwoffled program, to tell it when the computer is online or offline, and which URLs to get and when to fetch them. The control options are also available from the server on an interactive control web page at http://localhost:8080/control/. OPTIONS
The command line options available for the program are described below. -h | --help A help message is printed giving a brief description of the usage of the program. --version The version number of the program is printed. -online Tell the wwwoffled proxy server that the computer is online to the internet and that requests are to be fetched immediately. -autodial Tell the wwwoffled proxy server that the computer can become online to the internet if required for requests that are not already cached, but that pages that are in the cache do not require any network access. This is intended for use with dial-on-demand sys- tems (using diald for example). -offline Tell the wwwoffled proxy server that the computer is not online to the internet and that requests are to be cached until they are fetched later. -fetch Tell the wwwoffled proxy server to fetch all of the requests that have been cached. (The proxy server must be online for this to work.) The program will wait until all of the requests have been met before exiting. -config Tell the wwwoffled proxy server to re-read the configuration file. -dump Tell the wwwoffled proxy server to dump out the current program configuration. This is equivalent to the most recently read config- uration file and the built-in default options. -cyclelog Tell the wwwoffled proxy server to close and then re-open the log file. -purge Tell the wwwoffled proxy server that the cache is to be purged. The configuration file wwwoffle.conf(5) specifies the maximum age of the pages to keep. If a maximum cache size is specified then the oldest pages are deleted until the size is not exceeded. -status Request from the wwwoffled proxy server the current status of the program. The online or offline mode, the fetch and purge sta- tuses, the number of current processes and their PIDs are displayed. -kill Tell the wwwoffled proxy server to exit cleanly at the next convenient point. URL The URL of a web page that is to be fetched. This is the same as using a browser and entering the URL if not already in the cache or pressing the refresh button in the index if it is in the cache. file The name of an HTML file that is to be parsed and the links in it are to be fetched as if the URLs had been specified on the command line. -o Fetch the specified URL (from the cache or request it if not already cached when offline, or get it when online) and output it on standard output. This is an easy way of getting an image out of the cache to be used in other programs. The contents of the Modi- fyHTML section of the configuration file are ignored and the unmodified data is output. -O Fetch the specified URL (from the cache or request it if not already cached when offline, or get it when online) and output it on standard output including the HTTP header. The contents of the ModifyHTML section of the configuration file are ignored and the unmodified data is output. -F Force the specified URLs to be refreshed. Without this option, the page will not be fetched unless newer than the version on the server. -r[<depth>] Causes the pages linked to by the specified URLs also to be fetched if they are on the same host. This recursion works for a number of links specified by the depth parameter, a depth of 0 means only the specified page, a depth of 2 means all linked pages and all links from them. -R[<depth>] The same as -r above, but it also works for links that are not on the same host. -d[<depth>] The same as -r above, but is limited to links in the same directory or a sub-directory. -gS Also fetches the stylesheets that are included in the specified URLs. -gi Also fetches the images that are included in the specified URLs. -gf Also fetches the frames that are included in the specified URLs. -gs Also fetches the scripts that are included in the specified URLs. -go Also fetches the objects that are included in the specified URLs. -post Create a request using the POST method, the data is read from stdin, and appended to the request. The user must ensure that the format of the data is valid for a POST request. Any of the characters '&', '=' or ';' that are not being used for their reserved purpose must be URL-encoded in the input, other characters will be URL-encoded. -put Create a request using the PUT method, the data is read from stdin and appended to the request. -c <config-file> Specifies the name of the configuration file that contains the server host name, port numbers and authorisation password. This is required for the -online, -autodial, -offline, -fetch, -config, -dump, -purge, -status and -kill options if a password is set. The user must have read access to the configuration file to be able to use the command if a password is set. (See the StartUp and LocalHost section of wwwoffle.conf(5) for more information on setting the server host name, ports and password.) -p <host>[:<port>] Sets the hostname and port number that is to be used for the connection to the proxy server. For the -online, -autodial, -offline, -fetch, -config, -dump, -purge, -status and -kill options this must be the WWWOFFLE control port, for the URL options it must be the WWWOFFLE HTTP proxy server port. If no -p option is specified then the compile-time defaults are used. When the -F , -R[<depth>] , -r[<depth>] , -d[<depth>] or -g[Sisfo] options are given then they will override the options that are set in the FetchOptions section of the configuration and not fetch any other contents of the specified URL. For example if the fetch options nor- mally include images and frames then using the -gi option will only fetch images and not frames. All page contents to be fetched must be specified as command line options. Specifying -g without any options will fetch only the specified URL without any of the options. ENVIRONMENT VARIABLE
The WWWOFFLE_PROXY environment variable can be used instead of the -c or -p options. There are three ways that the variable can be used. 1. When the variable is set to absolute pathname of a file, then that file is used as the configuration file like the -c option (for example /etc/wwwoffle/wwwoffle.conf). 2. The WWWOFFLE_PROXY variable can also be set to the hostname and the port number that would be used with the -p option (for example localhost:8080). 3. The third possibility is to set the variable to the hostname and the two port numbers for the WWWOFFLE HTTP proxy port and the WWWOFFLE control port (for example localhost:8080:8081). This way it will work with both types of commands (proxy access and con- trol). SEE ALSO
wwwoffled(8), wwwoffle.conf(5), diald(8). AUTHOR
Andrew M. Bishop 1996-2009 (amb@gedanken.demon.co.uk) March 13, 2009 wwwoffle(1)
All times are GMT -4. The time now is 11:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy