The field separator is missing from radoulov's script.
Your description of the output you require is not understandable to me. This will print the lines in gprs_calls2.txt whose first field matches a value in (the first field of) gprs2.txt.
Last edited by era; 10-09-2008 at 07:25 AM..
Reason: I mean the earlier version of radoulov's script -- we posted basically at the same time
I have to compare two text files, very few of the lines in these files will have some difference in some column.
The files size is in GB.
Sample lines are as below:
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccdddddd
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccddeddd
So assuming these... (19 Replies)
Hello all,
Can anyone help me with this.
There are two files and I have to match the second file records with that of first and if matched, print the output in two fies, one containing the matched records and other containing the rest.
Here is the example.
File1
"111",erter,"00000", ... (4 Replies)
I have two text files which have records of thousand rows. Each row is having around 40 columns. Each column is tab delimited. Each row is delimited by newline character.
My requirement is to find for each row i need to find whether any column is different between the two files. For each row i... (8 Replies)
Hi i have 2 csv files a.csv and b.csv with the same number of columns and a list of values in both of it. Each and every individual value in both the files need to compared and if it matches then print correct in a new csv file otherwise print Incorrect
eg
a.csv
1,12/27/2007,Reward,$10.00... (5 Replies)
now i have a different file zoo.txt with content
123|zoo
234|natan
456|don
and file rick.txt with contents
123|dog|pie|pep
123|tail|see|newt
456|som|sin|sim
234|pay|rat|cat
i want to look for lines in file zoo.txt column1 that has same corresponding lines in column 1 of... (6 Replies)
Hi all,
i have two .csv files. i need to compare those two files and if there is any difference that should be moved into third .csv file.
example,
org.csv and dup.csv
when we compare those two files org.csv and dup.csv. if there is any change in dup.csv. it should be capture in third... (7 Replies)
Hello, I am trying to compare 2 files and get only the new lines as output. Note that new lines can be anywhere in the file and not necessarily at the bottom of the file.
I have made the following progress so far.
/home/aa>cat old.txt
0001 732 A
0002 732 C
0005 732 D... (7 Replies)
Dear All,
I would really appreciate if you can help me to resolve this file comparison
I have two files:
file1:
chr start end ID gene_name
chr1 2020 3030 1 test1
chr1 900 5000 2 test1
chr2 5000 8000 3 test2
chr3 6000 12000 4 test3
chr3 6000 15000 5 test3
file2:... (2 Replies)
HI,
I have two files and contains many Fields with | (pipe) delimitor, wanted to compare both the files and get only unmatched perticular fields. this i wanted to use in shell scriting.
ex:
first.txt
111 |abc| 230| hbc231 |bbb |210 |bbd405 |ghc |555 |cgv
second.txt
111 |abc |230 |hbc231... (1 Reply)
Discussion started by: prawinmca
1 Replies
LEARN ABOUT REDHAT
www::robotrules
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)