Remove all HTML, scripts and styles?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Remove all HTML, scripts and styles?
# 1  
Old 05-10-2012
Remove all HTML, scripts and styles?

Hi all,

How might I go about writing a program that will read all input as an HTML file, and subsequently strip all HTML, embedded scripts and style sheets from its input, leaving only text as the output?

I am a beginner, so the simpler, the better.

Thanks for any advice Smilie
# 2  
Old 05-10-2012
The subject of your post returns 241,000,000 hits from Google:

Let me google that for you
# 3  
Old 05-10-2012
ThomasMcA, thank you, I did google my question prior to finding this forum. I have actually gone through quite a few google pages, and have not found quite what I'm after, in terms that I can understand. I'm an utter beginner, and was hoping that somebody here could help me out, not just show me how to type something into a Google search bar Smilie
# 4  
Old 05-10-2012
What Operating System and version do you have and what Shell do you use?

What programming languages are installed on your computer?

Do you know any programming languages?
# 5  
Old 05-10-2012
Your short question that contained absolutely no details might as well have said "gimme the answer, I'm too lazy to look for myself."

If you did search first, that's great. But you didn't tell us that.

Even your last reply doesn't tell us much. Are you looking for a utility that strips out the code from HTML source? Or are you trying to learn a scripting language? if so, which one?

Perhaps you didn't find what you're after because your Google search was too vague. Perhaps you didn't find it because you're not really sure what you're looking for. If that is the case, nobody can help you.

So, please try again. Tell us exactly what you're trying to do.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove the values inside the html tags?

Hi, I have a txt file which contain this: <a href="linux">Linux</a> <a href="unix">Unix</a> <a href="oracle">Oracle</a> <a href="perl">Perl</a> I'm trying to extract the text in between these anchor tag and ignoring everything else using grep. I managed to ignore the tags but unable to... (6 Replies)
Discussion started by: KCApple
6 Replies

2. Shell Programming and Scripting

How to remove string inside html tag <a>

Does anybody know how i can remove string from <a> tag? There are several hundred posts in a few forums that need to be cleaned up. The precise situation is ---------- <a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43"> ------------- my... (6 Replies)
Discussion started by: georgi58
6 Replies

3. Shell Programming and Scripting

How to remove urls from html files

Does anybody know how to remove all urls from html files? all urls are links with anchor texts in the form of <a href="http://www.anydomain.com">ANCHOR</a> they may start with www or not. Goal is to delete all urls and keep the ANCHOR text and if possible to change tags around anchor to... (2 Replies)
Discussion started by: georgi58
2 Replies

4. Shell Programming and Scripting

Remove external urls from .html file

Hi everyone. I have an html file with lines like so: link href="localFolder/..."> link href="htp://..."> img src="localFolder/..."> img src="htp://..."> I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies

5. Shell Programming and Scripting

HTML code remove

Hello, I have one file which has been inserted intermittently with HTML web page. I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags. Can any one please suggest me sed regular expression for it. Thanks (3 Replies)
Discussion started by: nrbhole
3 Replies

6. Shell Programming and Scripting

command to remove attribute of an html tag

Is there any shell command to clean an html tag of its attributes. For ex <p align ="center"> with <p>. Thanks for your help!! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

7. Shell Programming and Scripting

html withing shell scripts,how??

Hi can anybody guide me to write html programs using shell script. FYI: I use ksh. Thanks in advance, Divya (6 Replies)
Discussion started by: divzz
6 Replies

8. Shell Programming and Scripting

Remove html tags with bash

Hello, is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too. The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Discussion started by: dejavu88
4 Replies

9. Shell Programming and Scripting

Access shell scripts from HTML page

Hi, I need (have been asked/order/instructed) to migrate the access of a number of ksh scripts into a html/web page environment. Currently access is with the user logging onto a unix box and accessing the scripts that way. The users are not unix people so I have restricted the access solely to... (4 Replies)
Discussion started by: nhatch
4 Replies
Login or Register to Ask a Question