Hello I am new to Unix. Please help me out.
My Scenario:
I am first collecting all the file names present in the directory with structure myinfo/yourinfo/supplierinfo
I have four files with the names myCollector.java, yourCollector.java, someCollector.java, everyCollector.java. in the directory.... (1 Reply)
I want to run an awk split on a value that has been pushed through an array and I was wondering what the syntax should be??
e.g. running time strings through an array and trying to examine just minutes:
12:25:30
10:15:13
08:55:23
awk '
NR==FNR{
... (2 Replies)
Hi Forum,
I am struggling with the for loop in shell script.
Let me explain what is needed in the script.
I have a file which will conatin some strings like
file1
place1
place2
place3
checkpoint
some other text
some more text
Now what my requirement is
the words ... (2 Replies)
I am writing a shell script using the korn shell. It seems that I am only
able to use local variables within a while loop that is reading a file.
(I can't access a variable outside a previously used while loop.) It's been
a while since I wrote shell scripts. Here is a sample
cat file.txt... (4 Replies)
NEWBIE ALERT!
Hi,
I'm 1 month into learning Perl and done reading "Minimal Perl" by Tim Maher (which I enjoyed enoumously). I'm not a programmer by profession but want to use Perl to automate various tasks at my job. I have a problem (obviously) and are looking for your much appreciated help.... (0 Replies)
Below is a test script I was trying to use so that I could understand why the logic was not working in a larger script. While accessing and printing array data inside the while loop, everything is fine. Outside the loop, i guess everything is null?? The for loop that is meant to cycle... (4 Replies)
Hello All,
Maybe I'm Missing something here but I have NOOO idea what the heck is going on with this....?
I have a Variable that contains a PATTERN of what I'm considering "Illegal Characters". So what I'm doing is looping
through a string containing some of these "Illegal Characters". Now... (5 Replies)
There are two parts to this. In the first part I need to read a list of files from a directory and split it into 4 arrays. I have done that with the following code,
# collect list of file names
STATS_INPUT_FILENAMES=($(ls './'$SET'/'$FOLD'/'*'in.txt'))
# get number of files... (8 Replies)
OS : RHEL 6.7
Shell : bash
I have a text file with 5.97 million lines.
I want to split this big file into 12 different files (in sequential order) so that each file will contain roughly 500K lines. I tried the following awk command after googling. But, it just created 2 files... (5 Replies)
Discussion started by: omega3
5 Replies
LEARN ABOUT REDHAT
www::robotrules
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)