02-26-2008
Replacing URL in a file with space
Hi,
I have a file with a URL text written in it within double quotes e.g.
"http://abcd.xyz.com/mno/somefile.dtd"
I want the above text to get replaced by a single space character.
I tried
cat File1.txt | sed -e 's/("http)*(dtd")/ /g' > File2.txt
But it didnt work out. Can someone suggest an sed command which replaces this url text, including double quotes, with a space.
Thanks
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I want to write a script which will check the arguments and if there is a single space(if 2 more more space in a row , then do not touch), replace it with _ and then gather the argument
so, program will be ran
./programname hi hello hi usa now hello hello
so, inside of program,... (7 Replies)
Discussion started by: convenientstore
7 Replies
2. UNIX for Dummies Questions & Answers
Hello All,
I have a file with thousands of records:
eg:
|000222|123456987|||||||AARONSON| JOHN P|||PRIMARY |P
|000111|567894521|||||||ATHENS| WILLIAM k|||AAAA|L
Expected:
|000222|123456987|||||||AARONSON| JOHN |P|||PRIMARY |P
|000111|567894521|||||||ATHENS| WILLIAM |k|||AAAA|L
I... (6 Replies)
Discussion started by: OSD
6 Replies
3. Shell Programming and Scripting
hi,
I have a file that is space separated at all columns. Basically what I want to do is replace all the space separations with column separations.
Thanks
kylle (1 Reply)
Discussion started by: kylle345
1 Replies
4. UNIX for Advanced & Expert Users
I would like to replace the value of * (which might have one or more whitespace(s) before and after *) using sed command in aix.
Eg: Var='Hi I am there *
Desired output: Hi I am there* (1 Reply)
Discussion started by: techmoris
1 Replies
5. Shell Programming and Scripting
hi
i want to replace spaces by comma
my file is
ADD 16428 170 160 3 WNPG 204 941 No 204802
ADD 16428 170 160 3 WNPG 204 941 No 204803
ADD 16428 170 160 3 WNPG 204 941 No 204804
ADD... (9 Replies)
Discussion started by: raghavendra.cse
9 Replies
6. Shell Programming and Scripting
I'm trying to replace a string "99999999'" with the blank where ever is there in the file. Could you please help in unix scripting.
Thank You. (6 Replies)
Discussion started by: vsairam
6 Replies
7. Shell Programming and Scripting
Hi Masters ,
I have a file whose header is like
HDRCZECM8CZCM000000881 SVR00120100401160828+020020100401160828+0200CZK
There is a space between 1 and S ,my req is to chng the space to T
I tried echo `head -1 CDCZECM8CZCM000000881` | sed 's/ /T/'
it works ,but how can I modify in... (5 Replies)
Discussion started by: Pratik4891
5 Replies
8. Shell Programming and Scripting
I have a string and want to replace the / with a space.
For example having "SP/FS/RP" I want to get "SP FS RP"
However I am having problems using gsub
set phases = `echo $Aphases | awk '{gsub(///," ")}; {print}'` (5 Replies)
Discussion started by: kristinu
5 Replies
9. Web Development
Hello,
I have a situation where I am trying to use Apache's RedirectMatch directive to redirect all users to a HTTPS URL except a single (Linux) user accessing there own webspace. I have found a piece of regular expression code that negates the username:
^((?!andy).)*$but when I try using it... (0 Replies)
Discussion started by: LostInTheWoods
0 Replies
10. Shell Programming and Scripting
Hello,
Am very new to perl , please help me here !!
I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file.
below is the script which i have written so far ,
#!/usr/bin/perl
use LWP::UserAgent;
use... (2 Replies)
Discussion started by: scott_cog
2 Replies
LEARN ABOUT REDHAT
www::robotrules
WWW::RobotRules(3) User Contributed Perl Documentation WWW::RobotRules(3)
NAME
WWW::RobotsRules - Parse robots.txt files
SYNOPSIS
require WWW::RobotRules;
my $robotsrules = new WWW::RobotRules 'MOMspider/1.0';
use LWP::Simple qw(get);
$url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
$url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$robotsrules->parse($url, $robots_txt);
# Now we are able to check if a URL is valid for those servers that
# we have obtained and parsed "robots.txt" files for.
if($robotsrules->allowed($url)) {
$c = get $url;
...
}
DESCRIPTION
This module parses a /robots.txt file as specified in "A Standard for Robot Exclusion", described in
<http://info.webcrawler.com/mak/projects/robots/norobots.html> Webmasters can use the /robots.txt file to disallow conforming robots access
to parts of their web site.
The parsed file is kept in the WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited.
The same WWW::RobotRules object can parse multiple /robots.txt files.
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
$rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.
ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows (this is an edited abstract of
<http://info.webcrawler.com/mak/projects/robots/norobots.html>):
The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form
<field-name>: <value>
The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The
following <field-names> can be used:
User-Agent
The value of this field is the name of the robot the record is describing access policy for. If more than one User-Agent field is
present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If
the value is '*', the record describes the default access policy for any robot that has not not matched any of the other records.
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
SEE ALSO
LWP::RobotUA, WWW::RobotRules::AnyDBM_File
libwww-perl-5.65 2001-04-20 WWW::RobotRules(3)