04-26-2012
How to remove urls from html files
Does anybody know how to remove all urls from html files?
all urls are links with anchor texts in the form of
<a href="http://www.anydomain.com">ANCHOR</a>
they may start with www or not.
Goal is to delete all urls and keep the ANCHOR text and if possible to change tags around anchor to <strong> or <em>.
any help is greatly appreciated
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hello,
is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too.
The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Discussion started by: dejavu88
4 Replies
2. Shell Programming and Scripting
Hello,
i try to extract urls from google-search-results, but i have problem with sed filtering of html-code.
what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code.
here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies
3. Shell Programming and Scripting
Hello,
I have one file which has been inserted intermittently with HTML web page.
I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags.
Can any one please suggest me sed regular expression for it.
Thanks (3 Replies)
Discussion started by: nrbhole
3 Replies
4. Shell Programming and Scripting
Hi,
I need to basically get a list of all the tarballs located at uri
I am currently doing a wget on urito get the index.html page
Now this index page contains the list of uris that I want to use in my bash script.
can someone please guide me ,.
I am new to Linux and shell scripting.
... (5 Replies)
Discussion started by: mnanavati
5 Replies
5. Shell Programming and Scripting
Hi everyone. I have an html file with lines like so:
link href="localFolder/...">
link href="htp://...">
img src="localFolder/...">
img src="htp://...">
I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies
6. Shell Programming and Scripting
Hello,
I just saw that on my vps (centOS) my oscommerce with a seo script
has created millions of tmp files inside the /html/cache/ directory.
I would need to remove all those files (millions), I tried via shell but the vps
loads goes to very high and it hangs, is there some way to do a... (7 Replies)
Discussion started by: andymc1
7 Replies
7. Shell Programming and Scripting
Does anybody know how i can remove string from <a> tag?
There are several hundred posts in a few forums that need to be cleaned up.
The precise situation is
----------
<a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43">
-------------
my... (6 Replies)
Discussion started by: georgi58
6 Replies
8. UNIX for Dummies Questions & Answers
Hi all,
How might I go about writing a program that will read all input as an HTML file, and subsequently strip all HTML, embedded scripts and style sheets from its input, leaving only text as the output?
I am a beginner, so the simpler, the better.
Thanks for any advice :) (4 Replies)
Discussion started by: Molly.P.
4 Replies
9. Shell Programming and Scripting
Hi,
I have a txt file which contain this:
<a href="linux">Linux</a>
<a href="unix">Unix</a>
<a href="oracle">Oracle</a>
<a href="perl">Perl</a>
I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Discussion started by: KCApple
6 Replies
10. Shell Programming and Scripting
Hi,
I am looking for a regex that will validate a URL and files accessed in a browser.
For example:http://www.google.co.uk
http://www.google.com
https://www.google.co.uk
https://www.google.com
ftp://
file:///somefile/on/a/server/accessed/from/browser/file.txt
So far I have:
... (4 Replies)
Discussion started by: muay_tb
4 Replies
LEARN ABOUT DEBIAN
bti-shrink-urls
BTI-SHRINK-URLS(1) bti-shrink-urls BTI-SHRINK-URLS(1)
NAME
bti-shrink-urls - convert URLs to a shorter form using a web service
SYNOPSIS
bti [--escaped] [--help] [URL]
DESCRIPTION
bti-shrink-urls converts URLs to a shorter form using a web service.
Currently http://2tu.us/ (default) and http://bit.ly / http://j.mp are supported.
OPTIONS
--escaped
Don't escape special characters in the URL any more, they are already percent encoded.
--help
Print help text.
URL
Specify the URL to be converted. If no URL is given bti-shrink-urls waits for input on stdin.
CONFIGURATION
bti-shrink-urls is configured by setting some values in ~/.bti:
shrink_host
Possible values: 2tu.us (default), bit.ly, j.mp
shrink_bitly_login
API login for bit.ly, j.mp, required if shrink_host is set to bit.ly or j.mp. See
https://code.google.com/p/bitly-api/wiki/ApiDocumentation
shrink_bitly_key
API key for bit.ly, j.mp, required if shrink_host is set to bit.ly or j.mp. See
https://code.google.com/p/bitly-api/wiki/ApiDocumentation
AUTHOR
Written by Bart Trojanowski bart@jukie.net.
COPYRIGHT AND LICENSE
Copyright (C) 2009 Bart Trojanowski bart@jukie.net.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.
bti-shrink-urls March 2009 BTI-SHRINK-URLS(1)