Sponsored Content
Full Discussion: robots.txt usage
Top Forums Web Development robots.txt usage Post 302329293 by Neo on Friday 26th of June 2009 01:22:40 PM
Old 06-26-2009
Quote:
Originally Posted by rickhlwong
Dear all,

I want to use robots.txt to control the "spider". can i specify a IP address to ALLOW the website can be accessed by the "spider"??
thank you.

Rick
We block unwanted spiders by IP address using ipchains. The robots.txt file uses the User Agent field in HTTP.

Many aggressive spiders do not follow robots.txt and will have to be blocked using something like ipchains or "insert your favorite" firewall tool
 

5 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

echo "ABC" > file1.txt file2.txt file3.txt

Hi Guru's, I need to create 3 files with the contents "ABC" using single command. Iam using: echo "ABC" > file1.txt file2.txt file3.txt the above command is not working. pls help me... With Regards / Ganapati (4 Replies)
Discussion started by: ganapati
4 Replies

2. HP-UX

how can I find cpu usage memory usage swap usage and logical volume usage

how can I find cpu usage memory usage swap usage and I want to know CPU usage above X% and contiue Y times and memory usage above X % and contiue Y times my final destination is monitor process logical volume usage above X % and number of Logical voluage above can I not to... (3 Replies)
Discussion started by: alert0919
3 Replies

3. AIX

How to monitor the IBM AIX server for I/O usage,memory usage,CPU usage,network..?

How to monitor the IBM AIX server for I/O usage, memory usage, CPU usage, network usage, storage usage? (3 Replies)
Discussion started by: laknar
3 Replies

4. Solaris

Netbackup robots not working

Hi All, I am facing a issue with robtest not working on netbackup 7.1 on solaris 10. I can see the robots and drives are deteted by O.S but not sure why robtest is not working. Below are few ouputs of few commands. $PWD>cfgadm -al -o show_FCP_dev Ap_Id Type ... (0 Replies)
Discussion started by: sahil_shine
0 Replies

5. IP Networking

TXT Records: Usage

Ok..the last DNS question. I've been on a DNS kick lately. So when poking around, I keep bumping sites that have txt records all with cryptic, but yet similar text in them.. Something like: cnn.com. 3305 IN TXT "882269757-4422010" cnn.com. 3305 IN TXT "ms=ms97284866"... (1 Reply)
Discussion started by: Lost in Cyberia
1 Replies
LWP::RobotUA(3pm)					User Contributed Perl Documentation					 LWP::RobotUA(3pm)

NAME
LWP::RobotUA - a class for well-behaved Web robots SYNOPSIS
use LWP::RobotUA; my $ua = LWP::RobotUA->new('my-robot/0.1', 'me@foo.com'); $ua->delay(10); # be very nice -- max one hit every ten minutes! ... # Then just use it just like a normal LWP::UserAgent: my $response = $ua->get('http://whatever.int/...'); ... DESCRIPTION
This class implements a user agent that is suitable for robot applications. Robots should be nice to the servers they visit. They should consult the /robots.txt file to ensure that they are welcomed and they should not make requests too frequently. But before you consider writing a robot, take a look at <URL:http://www.robotstxt.org/>. When you use a LWP::RobotUA object as your user agent, then you do not really have to think about these things yourself; "robots.txt" files are automatically consulted and obeyed, the server isn't queried too rapidly, and so on. Just send requests as you do when you are using a normal LWP::UserAgent object (using "$ua->get(...)", "$ua->head(...)", "$ua->request(...)", etc.), and this special agent will make sure you are nice. METHODS
The LWP::RobotUA is a sub-class of LWP::UserAgent and implements the same methods. In addition the following methods are provided: $ua = LWP::RobotUA->new( %options ) $ua = LWP::RobotUA->new( $agent, $from ) $ua = LWP::RobotUA->new( $agent, $from, $rules ) The LWP::UserAgent options "agent" and "from" are mandatory. The options "delay", "use_sleep" and "rules" initialize attributes private to the RobotUA. If "rules" are not provided, then "WWW::RobotRules" is instantiated providing an internal database of robots.txt. It is also possible to just pass the value of "agent", "from" and optionally "rules" as plain positional arguments. $ua->delay $ua->delay( $minutes ) Get/set the minimum delay between requests to the same server, in minutes. The default is 1 minute. Note that this number doesn't have to be an integer; for example, this sets the delay to 10 seconds: $ua->delay(10/60); $ua->use_sleep $ua->use_sleep( $boolean ) Get/set a value indicating whether the UA should sleep() if requests arrive too fast, defined as $ua->delay minutes not passed since last request to the given server. The default is TRUE. If this value is FALSE then an internal SERVICE_UNAVAILABLE response will be generated. It will have an Retry-After header that indicates when it is OK to send another request to this server. $ua->rules $ua->rules( $rules ) Set/get which WWW::RobotRules object to use. $ua->no_visits( $netloc ) Returns the number of documents fetched from this server host. Yeah I know, this method should probably have been named num_visits() or something like that. :-( $ua->host_wait( $netloc ) Returns the number of seconds (from now) you must wait before you can make a new request to this host. $ua->as_string Returns a string that describes the state of the UA. Mainly useful for debugging. SEE ALSO
LWP::UserAgent, WWW::RobotRules COPYRIGHT
Copyright 1996-2004 Gisle Aas. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2012-02-11 LWP::RobotUA(3pm)
All times are GMT -4. The time now is 09:33 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy