Removing HTML tags


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Removing HTML tags
# 1  
Old 10-26-2011
Removing HTML tags

Hello Unix Gurus

I am having a problem with one of the files that i am generating using a Unix Script. This Unix Scripts connects to the MY SQL Server and loads the data into a Text file. While generating the Text file for one of the tables the value in one of the column is as follows.
Code:
<p>
          Plan on doing a lot of running today. You are a hunter relentlessly tracking down those that dare steal your team's flag. If your flag gets captured, your only focus is to return it to your base. Nothing else matters. &nbsp;</p>
<div>
          <em><strong>How do I play?</strong></em></div>
<div>
          Enlist in the operation, then go into a public match and play. Your total score against all enlisted players, (Most Kills, Most Bomb Plants, etc.) within the stated period of time determines the winners. When the Operation ends, a leaderboard displays everyone&rsquo;s placement.</div>
<div>
          &nbsp;</div>
<div>
          <em><strong>Enlist now or you&rsquo;ll regret it</strong></em></div>
<div>
          Enlistment Period is over once the event starts. If the event starts at 12:00 AM(PDT), you better enlist by 11:58 PM (PDT) just to be safe.</div>

This format is allowing the data to come in different lines rather than jsut one line. Can anyone give me a Unix command which will remove all the HTML tags and have the data in one line.

Let me know if you have any questions.
Thanks for your help on this!!

Regards
Chetan

Last edited by radoulov; 10-26-2011 at 06:36 PM.. Reason: Code tags!
# 2  
Old 10-26-2011
html2text a.k.a html2txt - html2text: THE ASCIINATOR (aka html2txt) if you use Linux it should be in the packages, it is also in the FreeBSD ports. Other option is

Code:
lynx -dump

# 3  
Old 10-26-2011
Hey Click

Thanks for your reply

I am not trying to convert a HTML page into Text, But tring to eliminate the HTML tags in text column.

Regards
Chetan
# 4  
Old 10-27-2011
Put the removable html codes in a separate file ..
Code:
$ cat tags_to_remove
<p>
<strong>
&nbsp;
<div>
<em>
</em>
</strong>
</p>
</div>
$

And then run the following line ..
Code:
$ while read line; do sed "s,$line,,g" infile > infile1 ; mv infile1 infile; done<tags_to_remove

If needed, To remove the leading white spaces ..
Code:
$ nawk '{sub(/^[ \t]+/, "")};1' infile > outfile

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Html - Removing transparency on tooltips

I want to use the tooltip in html, however the tranparency is creating problem for detailed tooltips as the text from the back interferes with the readability of the tooltip text. I have done the following changes, however the normal tooltip es still transparent I call it using <a... (3 Replies)
Discussion started by: kristinu
3 Replies

2. Homework & Coursework Questions

Script: Removing HTML tags and duplicate lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: You will write a script that will remove all HTML tags from an HTML document and remove any consecutive... (3 Replies)
Discussion started by: tburns517
3 Replies

3. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies

4. Shell Programming and Scripting

Removing html tags

I store different variance of the below in an xml file. and apparently, xml has an issue loading up data like this because it contains html tags. i would like to preserve this data as it is, but unfortunately, xml says i cant. so i have to strip out all the html tags. the examples i found... (9 Replies)
Discussion started by: SkySmart
9 Replies

5. Shell Programming and Scripting

Remove html tags with particular string inside the tags

Could someone, please provide a solution to the following: I would like to remove some tags from the "head" of multiple html documents across the web site. They look like <link rel="alternate" type="application/rss+xml" title="Business and Investment in the Philippines"... (2 Replies)
Discussion started by: georgi58
2 Replies

6. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies

7. Shell Programming and Scripting

removing html format with sed

Hello i am trying to remove the html format from the file using sed. for example remove <p> </p> i tried to do this : sed -e 's/<*>//g' test > test.t but still i have some html format . please help if you have any suggestions lets say this is the html file 1... (11 Replies)
Discussion started by: koricha
11 Replies

8. Shell Programming and Scripting

searching & replacing/removing only certain HTML tags

I generally save a lot of web pages for reading offline which works out great for school. Now I have to spend a lot of time on the bus and I am looking for the best way to read some of these webpages using my Nokia 7610. I have uploaded the files to my phone, but they are deadly deadly slow to... (2 Replies)
Discussion started by: naphelge
2 Replies

9. Shell Programming and Scripting

removing html tags via parameter expansion

Hi all- I have a variable that contains a web page: echo $STUFF <html> <head> <title>my page</title></head> <body> blah blah etc.. Can I use the shell's parameter expansion abilities to remove just the tags? I thought that FIXHTML=${STUFF//<*>/} might do it, but it didn't seem to... (2 Replies)
Discussion started by: rev66
2 Replies
Login or Register to Ask a Question