|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Linux RedHat, Ubuntu, SUSE, Fedora, Debian, Mandriva, Slackware, Gentoo linux, PCLinuxOS. All Linux questions here! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
How to remove only html tags inside a file?
Hi All,
I have following example file i want to remove all html tags only, Input File: <html> <head> <title>Software Solutions Inc., </title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body bgcolor=white leftmargin="0" topmargin="0" marginwidth="00" marginheight="0" class=NormalFont> <table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2> <TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b> Iswar Ramamoorthy</b></font></TD> </TR> <tr> <td align=center><b>Date</b></td> <td align=center><b>Total Hours</b></td> <td align=center><b>Total IN Time</b></td> <td align=center><b>Total Break Hours</b></td> </tr> </table> <table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2> <TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Aman Jain</b></font></TD> </TR> <tr> <td align=center><b>Date</b></td> <td align=center><b>Total Hours</b></td> <td align=center><b>Total IN Time</b></td> <td align=center><b>Total Break Hours</b></td> </tr> </table> <table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2> <TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Anilkumar Kaandukuri</b></font></TD> </TR> <tr> <td align=center><b>Date</b></td> <td align=center><b>Total Hours</b></td> <td align=center><b>Total IN Time</b></td> <td align=center><b>Total Break Hours</b></td> </tr> <tr class=normalfont > <td align=center>11/16/2007</td> <td align=center>1:16:0</td> <td align=center>01:16</td> <td align=center>0</td> </tr> </table> <table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2> <TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Arun Sivaraman</b></font></TD> </TR> <tr> <td align=center><b>Date</b></td> <td align=center><b>Total Hours</b></td> <td align=center><b>Total IN Time</b></td> <td align=center><b>Total Break Hours</b></td> </tr> My expected result: Software Solutions Inc Iswar Ramamoorthy Date Total Hours Total IN Time Total Break Hours Aman Jain Date Total Hours Total IN Time Total Break Hours Anilkumar Kaandukuri Date Total Hours Total IN Time Total Break Hours 11/16/2007 1:16:0 01:16 0 ............ ........... etc............ |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Code:
sed -n '/^$/!{s/<[^>]*>//g;p;}' filenameOr, with a bit different output: Code:
lynx --dump filename (the file must have htm[l] extension) Or use html2text
Last edited by radoulov; 11-23-2007 at 11:17 AM.. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
All the commands are doing good,
sed -n '/^$/!{s/<[^>]*>//g;p;}' filename Please explain the above sed command Thanks, Thangaraju. Last edited by btech_raju; 11-23-2007 at 11:39 AM.. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Remove html tags with particular string inside the tags | georgi58 | Shell Programming and Scripting | 2 | 05-02-2012 11:10 PM |
| How to remove string inside html tag <a> | georgi58 | Shell Programming and Scripting | 6 | 04-27-2012 02:45 PM |
| Remove HTML Tags including the text between the tags. | shoaibjameel123 | Shell Programming and Scripting | 6 | 07-30-2011 12:03 PM |
| How to use sed to remove html tags including text between them | alphagon | Shell Programming and Scripting | 2 | 11-17-2008 06:05 PM |
| Remove html tags with bash | dejavu88 | Shell Programming and Scripting | 4 | 05-22-2008 01:58 PM |
|
|