How to remove only html tags inside a file?


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 11-23-2007
How to remove only html tags inside a file?

Hi All,

I have following example file

i want to remove all html tags only,

Input File:

<html>
<head>
<title>Software Solutions Inc., </title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor=white leftmargin="0" topmargin="0" marginwidth="00" marginheight="0" class=NormalFont>
<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b> Iswar Ramamoorthy</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Aman Jain</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Anilkumar Kaandukuri</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


<tr class=normalfont >
<td align=center>11/16/2007</td>
<td align=center>1:16:0</td>
<td align=center>01:16</td>
<td align=center>0</td>
</tr>

</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Arun Sivaraman</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>

My expected result:

Software Solutions Inc

Iswar Ramamoorthy

Date
Total Hours
Total IN Time
Total Break Hours

Aman Jain

Date
Total Hours
Total IN Time
Total Break Hours

Anilkumar Kaandukuri

Date
Total Hours
Total IN Time
Total Break Hours

11/16/2007
1:16:0
01:16
0

............
...........

etc............
btech_raju
# 2  
Old 11-23-2007
Code:
sed -n '/^$/!{s/<[^>]*>//g;p;}' filename

Or, with a bit different output:

Code:
lynx --dump filename

(the file must have htm[l] extension)

Or use html2text Smilie

Last edited by radoulov; 11-23-2007 at 11:17 AM..
# 3  
Old 11-23-2007
All the commands are doing good,

sed -n '/^$/!{s/<[^>]*>//g;p;}' filename

Please explain the above sed command

Thanks,
Thangaraju.

Last edited by btech_raju; 11-23-2007 at 11:39 AM..
btech_raju
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
How to remove multiline HTML tags from a file? threesixtyfive Shell Programming and Scripting 4 09-08-2015 03:59 PM
Using HTML inside shell script rogerben Shell Programming and Scripting 6 06-18-2015 05:34 AM
How to remove the values inside the html tags? KCApple Shell Programming and Scripting 6 10-15-2014 02:16 AM
Add HTML tags to file list ornesey Shell Programming and Scripting 4 10-03-2014 02:26 PM
How to remove a temporary file inside gawk ashish_kaithi Shell Programming and Scripting 2 07-01-2012 11:41 AM
Removing all except couple of html tags from html file juubuntu Shell Programming and Scripting 0 06-21-2012 08:07 AM
Remove html tags with particular string inside the tags georgi58 Shell Programming and Scripting 2 05-02-2012 11:10 PM
How to remove string inside html tag <a> georgi58 Shell Programming and Scripting 6 04-27-2012 02:45 PM
Parsing HTML, get text between 2 HTML tags Mysthik Shell Programming and Scripting 1 04-14-2012 11:46 AM
remove html tags,consecutive duplicate lines clicstic Shell Programming and Scripting 7 06-02-2011 09:04 AM
Remove external urls from .html file CowCow339 Shell Programming and Scripting 4 02-17-2011 02:52 PM
To remove file using rm inside c kkl Shell Programming and Scripting 7 01-18-2011 08:04 AM
Align Text within <p> Tags in a HTML file. parshant_bvcoe Shell Programming and Scripting 2 01-11-2009 06:52 AM
How to use sed to remove html tags including text between them alphagon Shell Programming and Scripting 2 11-17-2008 06:05 PM
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 01:58 PM