9 More Discussions You Might Find Interesting
1. UNIX for Beginners Questions & Answers
Hi Experts,
Our DHCP server currently answers the DHCP Discover requests from ServerX. In our dhcpd.conf file there are parameters defined for ServerX.
Now we introduced some additional Servers into the network and want them to get service from the same DHCP server.
Similar configuration... (13 Replies)
Discussion started by: ekorgur
13 Replies
2. UNIX for Advanced & Expert Users
Hello,
i configured rhel linux 6 with AD directory to authorize windows users to connect on the system and it works.
i have accounts with high privileges (oracle for example) if an account is created on the AD server i would to block him.
I looked for how to do, for the moment all the... (3 Replies)
Discussion started by: vincenzo
3 Replies
3. UNIX for Dummies Questions & Answers
Solaris 10 (korn shell)
I use -d option with ls command , when I want to suppress contents of the subdirectories being listed
when listing all the directories and files in a directory.
This is what man page says about -d option in ls command.
-d If an argument is a directory,... (3 Replies)
Discussion started by: kraljic
3 Replies
4. UNIX for Dummies Questions & Answers
I've used installp to install packages but when is it ideal to use make install? Havent had the opportunity to use this yet. (2 Replies)
Discussion started by: NycUnxer
2 Replies
5. UNIX for Dummies Questions & Answers
Hi,
While installation of apache on linux, we perform the below tasks.
1) Untar
2) configure
3) make
4) make install.
I wanted to understand the difference and working of configure/make/make install.
Can any one help me understanding this?
Thanks in advance. (1 Reply)
Discussion started by: praveen_b744
1 Replies
6. Programming
How I can get the current make-file name in a make-file
So, if I run make with specified file:make -f target.mak
is it possible to have the 'target' inside of the that 'target.mak' from the file name? (2 Replies)
Discussion started by: alex_5161
2 Replies
7. Solaris
hello there.
I would like to know how can I make sure HA server have exactly same contents.
for example
at timestamp 1 (before start install oracle product )
assume the both server have exactly same contents.
at timestamp 2 I install Oracle product at both server, hope... (3 Replies)
Discussion started by: qyxiell
3 Replies
8. UNIX for Dummies Questions & Answers
My system is ubuntu, can I use PMake ? (0 Replies)
Discussion started by: meili100
0 Replies
9. UNIX for Dummies Questions & Answers
I have nearly 10 users who login into the HP server (D series, HP UX 10.20) with the same UNIX user name, "liveuser", and they start the UNIX based transactions. If I create separate UNIX user-ids for all the 10, will the system performance improve? (1 Reply)
Discussion started by: augustinep
1 Replies
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)
NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)