How to remove string inside html tag <a>


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove string inside html tag <a>
# 1  
Old 04-27-2012
How to remove string inside html tag <a>

Does anybody know how i can remove string from <a> tag?
There are several hundred posts in a few forums that need to be cleaned up.
The precise situation is
----------
Code:
<a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43">

-------------

my goal is to end up with clean tag <a>.

Code:
<a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp

is what never changes in the strings.

Your help is greatly appreciated.

Last edited by Scrutinizer; 04-27-2012 at 11:20 AM.. Reason: code tags
# 2  
Old 04-27-2012
What is the output you are expecting from the sample input?

just <a> as output every time?
# 3  
Old 04-27-2012
yes

yes. just <a> every time
# 4  
Old 04-27-2012
Code:
 
sed 's/\(.*\)<a href.*>\(.*\)/\1\2/' input_file

This User Gave Thanks to panyam For This Post:
# 5  
Old 04-27-2012
more details

thank you for your time.

here are some more details

each line with the string i want to remove looks like this:

<td bgcolor="#eeeeee"><b><a href="http://mydomain/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=44">anchor text</a></b> </td>

your code deleted the anchor text and tags around and left only

<td bgcolor="#eeeeee"></td>

maybe it is not possible with sed to clear only container <a>.

any other ideas are welcome
# 6  
Old 04-27-2012
Code:
sed -e 's/<a href[^>]*>//g' -e 's/<\/a>//' input_file

This User Gave Thanks to jlliagre For This Post:
# 7  
Old 04-27-2012
works!

Code:
sed -e 's/<a href[^>]*>//g' -e 's/<\/a>//' input_file

this works like a charm.

now, would please tell me how to execute your command in safe way for all few hundred files and actually replace the string in there.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove html tag which has multiple lines in SHELL?

I want to clean a html file. I try to remove the script part in the html and remove the rest of tags and empty lines. The code I try to use is the following: sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies

2. Shell Programming and Scripting

How to remove the values inside the html tags?

Hi, I have a txt file which contain this: <a href="linux">Linux</a> <a href="unix">Unix</a> <a href="oracle">Oracle</a> <a href="perl">Perl</a> I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Discussion started by: KCApple
6 Replies

3. Shell Programming and Scripting

How can i find texts inside a html tag using sed?

How can i find texts inside a html tag using sed? Html texts: What i tried: cat infile | sed -e 's/\(<kbd*\)\(.*\)\(kbd>\)/\2/ Expected result like this: sed -i -e 's/@colophon/@@colophon/' \ -e 's/doc@cygnus.com/doc@@cygnus.com/' bfd/doc/bfd.texinfo (5 Replies)
Discussion started by: cola
5 Replies

4. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

5. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

6. Shell Programming and Scripting

Remove html tags with particular string inside the tags

Could someone, please provide a solution to the following: I would like to remove some tags from the "head" of multiple html documents across the web site. They look like <link rel="alternate" type="application/rss+xml" title="Business and Investment in the Philippines"... (2 Replies)
Discussion started by: georgi58
2 Replies

7. Shell Programming and Scripting

Finding a string inside A Tag

I have umpteen number of files containing HTML A tags in the below format or I want to find all the lines that contain the word Login= I used this command grep "Login=" * This gave me normal lines as well which contain the word Login= for example, it returned lines which... (2 Replies)
Discussion started by: dahlia84
2 Replies

8. Shell Programming and Scripting

command to remove attribute of an html tag

Is there any shell command to clean an html tag of its attributes. For ex <p align ="center"> with <p>. Thanks for your help!! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

9. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

10. Linux

How to remove only html tags inside a file?

Hi All, I have following example file i want to remove all html tags only, Input File: <html> <head> <title>Software Solutions Inc., </title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body bgcolor=white leftmargin="0" topmargin="0"... (2 Replies)
Discussion started by: btech_raju
2 Replies
Login or Register to Ask a Question