Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Remove all HTML, scripts and styles? Post 302638385 by Molly.P. on Thursday 10th of May 2012 08:04:23 AM
Old 05-10-2012
Remove all HTML, scripts and styles?

Hi all,

How might I go about writing a program that will read all input as an HTML file, and subsequently strip all HTML, embedded scripts and style sheets from its input, leaving only text as the output?

I am a beginner, so the simpler, the better.

Thanks for any advice Smilie
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Access shell scripts from HTML page

Hi, I need (have been asked/order/instructed) to migrate the access of a number of ksh scripts into a html/web page environment. Currently access is with the user logging onto a unix box and accessing the scripts that way. The users are not unix people so I have restricted the access solely to... (4 Replies)
Discussion started by: nhatch
4 Replies

2. Shell Programming and Scripting

Remove html tags with bash

Hello, is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too. The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Discussion started by: dejavu88
4 Replies

3. Shell Programming and Scripting

html withing shell scripts,how??

Hi can anybody guide me to write html programs using shell script. FYI: I use ksh. Thanks in advance, Divya (6 Replies)
Discussion started by: divzz
6 Replies

4. Shell Programming and Scripting

command to remove attribute of an html tag

Is there any shell command to clean an html tag of its attributes. For ex <p align ="center"> with <p>. Thanks for your help!! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

5. Shell Programming and Scripting

HTML code remove

Hello, I have one file which has been inserted intermittently with HTML web page. I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags. Can any one please suggest me sed regular expression for it. Thanks (3 Replies)
Discussion started by: nrbhole
3 Replies

6. Shell Programming and Scripting

Remove external urls from .html file

Hi everyone. I have an html file with lines like so: link href="localFolder/..."> link href="htp://..."> img src="localFolder/..."> img src="htp://..."> I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies

7. Shell Programming and Scripting

How to remove urls from html files

Does anybody know how to remove all urls from html files? all urls are links with anchor texts in the form of <a href="http://www.anydomain.com">ANCHOR</a> they may start with www or not. Goal is to delete all urls and keep the ANCHOR text and if possible to change tags around anchor to... (2 Replies)
Discussion started by: georgi58
2 Replies

8. Shell Programming and Scripting

How to remove string inside html tag <a>

Does anybody know how i can remove string from <a> tag? There are several hundred posts in a few forums that need to be cleaned up. The precise situation is ---------- <a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43"> ------------- my... (6 Replies)
Discussion started by: georgi58
6 Replies

9. Shell Programming and Scripting

How to remove the values inside the html tags?

Hi, I have a txt file which contain this: <a href="linux">Linux</a> <a href="unix">Unix</a> <a href="oracle">Oracle</a> <a href="perl">Perl</a> I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Discussion started by: KCApple
6 Replies
STRIP_TAGS(3)								 1							     STRIP_TAGS(3)

strip_tags - Strip HTML and PHP tags from a string

SYNOPSIS
string strip_tags (string $str, [string $allowable_tags]) DESCRIPTION
This function tries to return a string with all NULL bytes, HTML and PHP tags stripped from a given $str. It uses the same tag stripping state machine as the fgetss(3) function. PARAMETERS
o $str - The input string. o $allowable_tags - You can use the optional second parameter to specify tags which should not be stripped. Note HTML comments and PHP tags are also stripped. This is hardcoded and can not be changed with $allowable_tags. Note This parameter should not contain whitespace. strip_tags(3) sees a tag as a case-insensitive string between < and the first whitespace or >. Note In PHP 5.3.4 and later, you will also need to include the self-closing XHTML tag to strip these from $str. For example, to strip both <br> and <br/>, you should use: <?php strip_tags($input, '<br><br/>'); ?> RETURN VALUES
Returns the stripped string. CHANGELOG
+--------+---------------------------------------------------+ |Version | | | | | | | Description | | | | +--------+---------------------------------------------------+ | 5.3.4 | | | | | | | strip_tags(3) no longer strips self-closing XHTML | | | tags unless the self-closing XHTML tag is also | | | given in $allowable_tags. | | | | | 5.0.0 | | | | | | | strip_tags(3) is now binary safe. | | | | +--------+---------------------------------------------------+ EXAMPLES
Example #1 strip_tags(3) example <?php $text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>'; echo strip_tags($text); echo " "; // Allow <p> and <a> echo strip_tags($text, '<p><a>'); ?> The above example will output: Test paragraph. Other text <p>Test paragraph.</p> <a href="#fragment">Other text</a> NOTES
Warning Because strip_tags(3) does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected. Warning This function does not modify any attributes on the tags that you allow using $allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users. Note Tag names within the input HTML that are greater than 1023 bytes in length will be treated as though they are invalid, regardless of the $allowable_tags parameter. SEE ALSO
htmlspecialchars(3). PHP Documentation Group STRIP_TAGS(3)
All times are GMT -4. The time now is 11:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy