Hi Guys,
I have a file like this:
aaa b c d e f
fsss g h i k l
qqq r t h n
I want:
aaa b c d e f
fsss g h i k l
qqq r t h , n
ggg p t e d u
qqq i o s , k (2 Replies)
Hi,
I got long list of reference file (column one is refer to the header in input file; column 2 is info of start position in input file; column 3 is info of end position in input file;) shown as below:
read_2 10 15
read_3 5 8
read_1 4 10
.
.
.
Input file (huge file with total... (6 Replies)
i have a reqirement to adjust the data in a file based on a perticular character
the sample data is as below
483PDEAN CORRIGAN 52304037528955WAGES 50000
89BP ABCD MASTER352 5434604223735428 4200
58BP SOUTHERN WA848 ... (1 Reply)
I'm looking for what I hope might be a one liner along these lines:
sed '/a line with more than 3 pipes in it/d'
I know how to get the pipe count in a string and store it in a variable, but I'm greedy enough to hope that it's possible via regex in the /.../d context. Am I asking too much? ... (5 Replies)
Hi,
I have an issue to combine multiple lines of a file. I have records as below.
Fields are delimited by TAB. Each lines are ending with a new line char (\n)
Input
--------
ABC 123456 abcde 987
890456 7890 xyz
ght gtuv
ABC 5tyin 1234 789
ghty kuio
ABC ghty jind 1234
678 ght
... (8 Replies)
Hello,
i need help with awk.
I have this file:
cat number
DirB port 67 er_enc_out 0 er_bad_os 0
DirB port 71 er_enc_out 56 er_bad_os 0
DirB port 74 er_enc_out 0 er_bad_os 0
DirB port 75 ... (4 Replies)
I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
In the perl below, which does execute, I am having trouble with the else in Rule 3. The digit in f{8} is extracted and used to update f accordinly along with the value in f.
There can be either - * or + before the number that is extracted but the same logic applies, that is if the value is greater... (5 Replies)
I have an input file with
A=xyz
B=pqr
I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B )
while read line
do
NEW_VALUE = `some functionality done on $line`
If $line=First Field-... (1 Reply)
I will appreciate if you help me here in this script in Solaris Enviroment.
Scenario:
i have 2 files :
1) /tmp/TRANSACTIONS_DAILY_20180730.txt:
201807300000000004
201807300000000005
201807300000000006
201807300000000007
201807300000000008
2)... (10 Replies)
Discussion started by: teokon90
10 Replies
LEARN ABOUT REDHAT
www::robotrules
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)