10-18-2011
wget url1 url2 url3 url4 ? robots.txt doesn't apply unless you're telling it to go recursively...
Last edited by Corona688; 10-18-2011 at 06:09 PM..
10 More Discussions You Might Find Interesting
1. OS X (Apple)
I need a simple command line executable that allows me to join many wmv files into one output wmv file, preferrably in a simple way like this:
wmvjoin file1.wmv file2.wmv .... > outputfile.wmv
So what I want is the wmv-equivalent of mpgtx
I cannot find it on internet.
Thanks. (2 Replies)
Discussion started by: karman
2 Replies
2. Solaris
hello all. I use wget to fetch and to do the download. But in solaris I can not use the command line to download because they do not have wget and fetch.please help me for this?
Tnx (2 Replies)
Discussion started by: moslemovic
2 Replies
3. Shell Programming and Scripting
my requirement is,
consider a file output
cat output
blah sdjfhjkd jsdfhjksdh
sdfs 23423 sdfsdf sdf"sdfsdf"sdfsdf"""""dsf
hellow there
this doesnt look good
et cetc etc
etcetera
i want to replace a line of line number 4 ("this doesnt look good") with some other line
... (3 Replies)
Discussion started by: vivek d r
3 Replies
4. Shell Programming and Scripting
Need Assistance . Using wget how can i download multiple files from http site. Http doesnt has wild card (*) but FTP has it . Any ideas will be appreciative.
wget --timeout=120 --append-output=output.txt --no-directories --cut-dirs=1 -np -m --accept=grib2 -r http://sample.com/... (4 Replies)
Discussion started by: ajayram_arya
4 Replies
5. UNIX for Dummies Questions & Answers
Hello,
any way to download file from image captcha download protected website? The download link is not static but session based, generated.
I can do also via web browser, but i trust rather command line, maybe im wrong (1 Reply)
Discussion started by: postcd
1 Replies
6. UNIX for Dummies Questions & Answers
Hi Folks,
I have the file in which I need to multiply the content of a line and replace the initial content of that line with the obtained answer.
For example if this is my input file file1.txt
2.259314750 xxxxxx
1.962774350 xxxxxx
2.916817290 xxxxxx
1.355026900 ... (4 Replies)
Discussion started by: Madiouma Ndiaye
4 Replies
7. Shell Programming and Scripting
I am trying to download all files from a user authentication, password protected https site, with a particular extension (.bam). The files are ~20GB each and I am not sure if the below is the best way to do it. I am also not sure how to direct the downloaded files to a folder as well as external... (7 Replies)
Discussion started by: cmccabe
7 Replies
8. Shell Programming and Scripting
I am using the below curl command to download a single file from client server and it is working as expected
curl --ftp-ssl -k -u ${USER}:${PASSWD} ftp://${HOST}:${PORT}/path/to/${FILE} --output ${DEST}/${FILE}
let say the client has 3 files hellofile.101, hellofile.102, hellofile.103 and I... (3 Replies)
Discussion started by: r@v!7*7@
3 Replies
9. Shell Programming and Scripting
The bash below will download all the files in download to /home/Desktop/folder. That works great, but within /home/Desktop/folder there are several folders bam, other, and vcf, is there a way to specify by extention in the download file where to download it to?
For example, all .pdf and .zip... (2 Replies)
Discussion started by: cmccabe
2 Replies
10. Shell Programming and Scripting
I am looking for help in processing of those options: '-n' or '-p'
I understand what they do and how to use them.
But, I would like to use them with more than one file (and without any shell-loop; loading the 'perl' once.)
I did try it and -n works on 2 files.
Question is:
- is it possible to... (6 Replies)
Discussion started by: alex_5161
6 Replies
LEARN ABOUT OSX
lwp::robotua
LWP::RobotUA(3) User Contributed Perl Documentation LWP::RobotUA(3)
NAME
LWP::RobotUA - a class for well-behaved Web robots
SYNOPSIS
use LWP::RobotUA;
my $ua = LWP::RobotUA->new('my-robot/0.1', 'me@foo.com');
$ua->delay(10); # be very nice -- max one hit every ten minutes!
...
# Then just use it just like a normal LWP::UserAgent:
my $response = $ua->get('http://whatever.int/...');
...
DESCRIPTION
This class implements a user agent that is suitable for robot applications. Robots should be nice to the servers they visit. They should
consult the /robots.txt file to ensure that they are welcomed and they should not make requests too frequently.
But before you consider writing a robot, take a look at <URL:http://www.robotstxt.org/>.
When you use a LWP::RobotUA object as your user agent, then you do not really have to think about these things yourself; "robots.txt" files
are automatically consulted and obeyed, the server isn't queried too rapidly, and so on. Just send requests as you do when you are using a
normal LWP::UserAgent object (using "$ua->get(...)", "$ua->head(...)", "$ua->request(...)", etc.), and this special agent will make sure
you are nice.
METHODS
The LWP::RobotUA is a sub-class of LWP::UserAgent and implements the same methods. In addition the following methods are provided:
$ua = LWP::RobotUA->new( %options )
$ua = LWP::RobotUA->new( $agent, $from )
$ua = LWP::RobotUA->new( $agent, $from, $rules )
The LWP::UserAgent options "agent" and "from" are mandatory. The options "delay", "use_sleep" and "rules" initialize attributes
private to the RobotUA. If "rules" are not provided, then "WWW::RobotRules" is instantiated providing an internal database of
robots.txt.
It is also possible to just pass the value of "agent", "from" and optionally "rules" as plain positional arguments.
$ua->delay
$ua->delay( $minutes )
Get/set the minimum delay between requests to the same server, in minutes. The default is 1 minute. Note that this number doesn't
have to be an integer; for example, this sets the delay to 10 seconds:
$ua->delay(10/60);
$ua->use_sleep
$ua->use_sleep( $boolean )
Get/set a value indicating whether the UA should sleep() if requests arrive too fast, defined as $ua->delay minutes not passed since
last request to the given server. The default is TRUE. If this value is FALSE then an internal SERVICE_UNAVAILABLE response will be
generated. It will have an Retry-After header that indicates when it is OK to send another request to this server.
$ua->rules
$ua->rules( $rules )
Set/get which WWW::RobotRules object to use.
$ua->no_visits( $netloc )
Returns the number of documents fetched from this server host. Yeah I know, this method should probably have been named num_visits() or
something like that. :-(
$ua->host_wait( $netloc )
Returns the number of seconds (from now) you must wait before you can make a new request to this host.
$ua->as_string
Returns a string that describes the state of the UA. Mainly useful for debugging.
SEE ALSO
LWP::UserAgent, WWW::RobotRules
COPYRIGHT
Copyright 1996-2004 Gisle Aas.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.16.2 2012-02-11 LWP::RobotUA(3)