Sponsored Content
Top Forums Shell Programming and Scripting How to remove urls from html files Post 302630873 by georgi58 on Thursday 26th of April 2012 12:02:50 PM
Old 04-26-2012
How to remove urls from html files

Does anybody know how to remove all urls from html files?

all urls are links with anchor texts in the form of

<a href="http://www.anydomain.com">ANCHOR</a>

they may start with www or not.

Goal is to delete all urls and keep the ANCHOR text and if possible to change tags around anchor to <strong> or <em>.

any help is greatly appreciated
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove html tags with bash

Hello, is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too. The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Discussion started by: dejavu88
4 Replies

2. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies

3. Shell Programming and Scripting

HTML code remove

Hello, I have one file which has been inserted intermittently with HTML web page. I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags. Can any one please suggest me sed regular expression for it. Thanks (3 Replies)
Discussion started by: nrbhole
3 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. Shell Programming and Scripting

Remove external urls from .html file

Hi everyone. I have an html file with lines like so: link href="localFolder/..."> link href="htp://..."> img src="localFolder/..."> img src="htp://..."> I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies

6. Shell Programming and Scripting

Need script to remove millions of tmp files in /html/cache/ directory

Hello, I just saw that on my vps (centOS) my oscommerce with a seo script has created millions of tmp files inside the /html/cache/ directory. I would need to remove all those files (millions), I tried via shell but the vps loads goes to very high and it hangs, is there some way to do a... (7 Replies)
Discussion started by: andymc1
7 Replies

7. Shell Programming and Scripting

How to remove string inside html tag <a>

Does anybody know how i can remove string from <a> tag? There are several hundred posts in a few forums that need to be cleaned up. The precise situation is ---------- <a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43"> ------------- my... (6 Replies)
Discussion started by: georgi58
6 Replies

8. UNIX for Dummies Questions & Answers

Remove all HTML, scripts and styles?

Hi all, How might I go about writing a program that will read all input as an HTML file, and subsequently strip all HTML, embedded scripts and style sheets from its input, leaving only text as the output? I am a beginner, so the simpler, the better. Thanks for any advice :) (4 Replies)
Discussion started by: Molly.P.
4 Replies

9. Shell Programming and Scripting

How to remove the values inside the html tags?

Hi, I have a txt file which contain this: <a href="linux">Linux</a> <a href="unix">Unix</a> <a href="oracle">Oracle</a> <a href="perl">Perl</a> I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Discussion started by: KCApple
6 Replies

10. Shell Programming and Scripting

Regex for URLs and files

Hi, I am looking for a regex that will validate a URL and files accessed in a browser. For example:http://www.google.co.uk http://www.google.com https://www.google.co.uk https://www.google.com ftp:// file:///somefile/on/a/server/accessed/from/browser/file.txt So far I have: ... (4 Replies)
Discussion started by: muay_tb
4 Replies
BTI-SHRINK-URLS(1)						  bti-shrink-urls						BTI-SHRINK-URLS(1)

NAME
bti-shrink-urls - convert URLs to a shorter form using a web service SYNOPSIS
bti [--escaped] [--help] [URL] DESCRIPTION
bti-shrink-urls converts URLs to a shorter form using a web service. Currently http://2tu.us/ (default) and http://bit.ly / http://j.mp are supported. OPTIONS
--escaped Don't escape special characters in the URL any more, they are already percent encoded. --help Print help text. URL Specify the URL to be converted. If no URL is given bti-shrink-urls waits for input on stdin. CONFIGURATION
bti-shrink-urls is configured by setting some values in ~/.bti: shrink_host Possible values: 2tu.us (default), bit.ly, j.mp shrink_bitly_login API login for bit.ly, j.mp, required if shrink_host is set to bit.ly or j.mp. See https://code.google.com/p/bitly-api/wiki/ApiDocumentation shrink_bitly_key API key for bit.ly, j.mp, required if shrink_host is set to bit.ly or j.mp. See https://code.google.com/p/bitly-api/wiki/ApiDocumentation AUTHOR
Written by Bart Trojanowski bart@jukie.net. COPYRIGHT AND LICENSE
Copyright (C) 2009 Bart Trojanowski bart@jukie.net. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License. bti-shrink-urls March 2009 BTI-SHRINK-URLS(1)
All times are GMT -4. The time now is 03:34 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy