Sponsored Content
The Lounge What is on Your Mind? Top Cybersecurity Threats Earth Year 2019 | You Have Been Warned! Post 303036330 by wisecracker on Sunday 23rd of June 2019 09:12:21 AM
Old 06-23-2019
Hi Neo...

OK, I will but expect criticism of any background music choice if I think you have chosen wrongly.
I have had serious experience in the music scene for decades and classically trained on Clarinet and Cello and self taught "rock" and pseudo-classical Guitarist.

(Addendum; there are four tracks of me and my band from 1976, (IIRC), on the WWW.)
This User Gave Thanks to wisecracker For This Post:
 

3 More Discussions You Might Find Interesting

1. Cybersecurity

The Top Ten Cybersecurity Threats for 2009 - Draft for Comments

Following up on my 2008 list of top cybersecurity threats, I have just published The Top Ten Cybersecurity Threats for 2009 for public comments. If you are interested in cybersecurity threats, kindly email your suggestions or comments directly to me (tim dot silkroad at gmail dot com).  I will... (0 Replies)
Discussion started by: Linux Bot
0 Replies

2. What is on Your Mind?

Exactly 1 year ago today, 18-09-2019...

This is mainly for Corona688, today's date 18-09-2019. Remember from little acorns big trees grow a few months ago? Well this is well on the way to 1000+ dls by the end of the year... AMINET from its inception in 1992 is accessed by very, very many and the AMIGA is still loved by millions. ... (1 Reply)
Discussion started by: wisecracker
1 Replies

3. What is on Your Mind?

Moderator of the Year 2019 Award Announcement Only

Dear All, We are happy to post that I will be announcing soon my award for "Moderator of the Year 2019". This is a new award which I plan to announce in December of each year, starting this year (2019). The prizes will be (still working out the details): A Moderator of the Year... (3 Replies)
Discussion started by: Neo
3 Replies
WWW::RobotRules(3)					User Contributed Perl Documentation					WWW::RobotRules(3)

NAME
WWW::RobotsRules - Parse robots.txt files SYNOPSIS
require WWW::RobotRules; my $robotsrules = new WWW::RobotRules 'MOMspider/1.0'; use LWP::Simple qw(get); $url = "http://some.place/robots.txt"; my $robots_txt = get $url; $robotsrules->parse($url, $robots_txt); $url = "http://some.other.place/robots.txt"; my $robots_txt = get $url; $robotsrules->parse($url, $robots_txt); # Now we are able to check if a URL is valid for those servers that # we have obtained and parsed "robots.txt" files for. if($robotsrules->allowed($url)) { $c = get $url; ... } DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in <http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access to parts of their web site. The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited. The same WWW::RobotRules object can parse multiple /robots.txt files. The following methods are provided: $rules = WWW::RobotRules->new($robot_name) This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot. $rules->parse($robot_txt_url, $content, $fresh_until) The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file. $rules->allowed($uri) Returns TRUE if this robot is allowed to retrieve this URL. $rules->agent([$name]) Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache. ROBOTS.TXT The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of <http://info.webcrawler.com/mak/projects/robots/norobots.html>): The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form <field-name>: <value> The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The following <field-names> can be used: User-Agent The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records. Disallow The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved ROBOTS.TXT EXAMPLES The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/": User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/ # these will soon disappear This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called "cybermapper": User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space # Cybermapper knows where to go. User-agent: cybermapper Disallow: This example indicates that no robots should visit this site further: # go away User-agent: * Disallow: / SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)
All times are GMT -4. The time now is 07:22 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy