For counting the occurences of specific character in the file
I am issuing the command
grep -o 'character' filename | wc -w
It works in other shells but not in HP-UX as there is no option -o for grep.
What do I do now? (9 Replies)
I'm trying to count the number of 2 specific characters in a very large file. I'd like to avoid using gsub because its taking too long.
I was thinking something like:
awk '-F' { t += NF - 1 } END {print t}' infile > outfile
which isn't working
Any ideas would be great. (3 Replies)
Hi,
I need to add Pipe (|) at 5th and 18th position of all records a file. How can I do this?
I tried to add it at 5th position using the below code. It didnt work. Please help!!!
awk '{substr($0,5,1) ~ /|/}{print}' $input_file > $temp_file (1 Reply)
I am trying to use sed to replace specific characters at a specific position in the file with a different value... can this be done?
Example:
File:
A0199999123
A0199999124
A0199999125
Need to replace 99999 in positions 3-7 with 88888.
Any help is appreciated. (5 Replies)
Hi,
I need help regarding counting specific word or character per line and validate it against a specific number i.e 10. And if number of character equals the specific number then that line will be part of the output.
Specific number = 6
Specific word or char = ||
Sample data:... (1 Reply)
I'm looking for what I hope might be a one liner along these lines:
sed '/a line with more than 3 pipes in it/d'
I know how to get the pipe count in a string and store it in a variable, but I'm greedy enough to hope that it's possible via regex in the /.../d context. Am I asking too much? ... (5 Replies)
Hi, im still new in unix.
i want to ask how to delete character on specific position in line, lets say i want to remove 5 character from position 1000, so characters from position 1000-1005 will be deleted.
i found this sed command can delete 4 characters from position 10, but i dont know if... (7 Replies)
Discussion started by: bluesue
7 Replies
10. Post Here to Contact Site Administrators and Moderators
In file, we have millions of records each of 1000 in length. And at specific position say 800 there is a space, we need to replace it with Character X if the ID in that row starts with 123.
So far i have used the below which is replacing space at that position to X but its not checking for... (3 Replies)
Discussion started by: Jagmeet Singh
3 Replies
LEARN ABOUT REDHAT
www::robotrules
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)