Hello,
is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too.
The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Hello,
i try to extract urls from google-search-results, but i have problem with sed filtering of html-code.
what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code.
here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Hello,
I have one file which has been inserted intermittently with HTML web page.
I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags.
Can any one please suggest me sed regular expression for it.
Thanks (3 Replies)
Hi,
I need to basically get a list of all the tarballs located at uri
I am currently doing a wget on urito get the index.html page
Now this index page contains the list of uris that I want to use in my bash script.
can someone please guide me ,.
I am new to Linux and shell scripting.
... (5 Replies)
Hi everyone. I have an html file with lines like so:
link href="localFolder/...">
link href="htp://...">
img src="localFolder/...">
img src="htp://...">
I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Hello,
I just saw that on my vps (centOS) my oscommerce with a seo script
has created millions of tmp files inside the /html/cache/ directory.
I would need to remove all those files (millions), I tried via shell but the vps
loads goes to very high and it hangs, is there some way to do a... (7 Replies)
Does anybody know how i can remove string from <a> tag?
There are several hundred posts in a few forums that need to be cleaned up.
The precise situation is
----------
<a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43">
-------------
my... (6 Replies)
Hi all,
How might I go about writing a program that will read all input as an HTML file, and subsequently strip all HTML, embedded scripts and style sheets from its input, leaving only text as the output?
I am a beginner, so the simpler, the better.
Thanks for any advice :) (4 Replies)
Hi,
I have a txt file which contain this:
<a href="linux">Linux</a>
<a href="unix">Unix</a>
<a href="oracle">Oracle</a>
<a href="perl">Perl</a>
I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Hi,
I am looking for a regex that will validate a URL and files accessed in a browser.
For example:http://www.google.co.uk
http://www.google.com
https://www.google.co.uk
https://www.google.com
ftp://
file:///somefile/on/a/server/accessed/from/browser/file.txt
So far I have:
... (4 Replies)
Discussion started by: muay_tb
4 Replies
LEARN ABOUT MOJAVE
urn-scheme
uri_urn(n) Tcl Uniform Resource Identifier Management uri_urn(n)
__________________________________________________________________________________________________________________________________________________NAME
uri_urn - URI utilities, URN scheme
SYNOPSIS
package require Tcl 8.2
package require uri::urn ?1.1.2?
uri::urn::quote url
uri::urn::unquote url
_________________________________________________________________DESCRIPTION
This package provides two commands to quote and unquote the disallowed characters for url using the urn scheme, registers the scheme with
the package uri, and provides internal helpers which will be automatically used by the commands uri::split and uri::join of package uri to
handle urls using the urn scheme.
COMMANDS
uri::urn::quote url
This command quotes the characters disallowed by the urn scheme (per RFC 2141 sec2.2) in the url and returns the modified url as its
result.
uri::urn::unquote url
This commands performs the reverse of ::uri::urn::quote. It takes an urn url, removes the quoting from all disallowed characters,
and returns the modified urls as its result.
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category uri of
the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for
either package and/or documentation.
KEYWORDS
rfc 2141, uri, url, urn
CATEGORY
Networking
uri 1.1.2 uri_urn(n)