The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Operating Systems > Linux
.
google unix.com



Linux RedHat, Ubuntu, SUSE, Fedora, Debian, Mandriva, Slackware, Gentoo linux, PCLinuxOS. All Linux questions here!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 01:58 PM
How to supplement HTML tags with SED DocBrewer Shell Programming and Scripting 3 04-25-2008 09:40 AM
html tags dunryc Shell Programming and Scripting 3 11-29-2007 05:14 PM
How to split file by tags inside file? spookyrtd99 Shell Programming and Scripting 1 07-31-2006 12:50 AM
Automated replacement of HTML Tags nem_kirk SUN Solaris 1 11-17-2005 12:24 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 11-23-2007
btech_raju
Guest
 

Posts: n/a
Bits: 0 [Banking]
How to remove only html tags inside a file?

Hi All,

I have following example file

i want to remove all html tags only,

Input File:

<html>
<head>
<title>Software Solutions Inc., </title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor=white leftmargin="0" topmargin="0" marginwidth="00" marginheight="0" class=NormalFont>
<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b> Iswar Ramamoorthy</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Aman Jain</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Anilkumar Kaandukuri</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>


<tr class=normalfont >
<td align=center>11/16/2007</td>
<td align=center>1:16:0</td>
<td align=center>01:16</td>
<td align=center>0</td>
</tr>

</table>


<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b>Arun Sivaraman</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>

My expected result:

Software Solutions Inc

Iswar Ramamoorthy

Date
Total Hours
Total IN Time
Total Break Hours

Aman Jain

Date
Total Hours
Total IN Time
Total Break Hours

Anilkumar Kaandukuri

Date
Total Hours
Total IN Time
Total Break Hours

11/16/2007
1:16:0
01:16
0

............
...........

etc............
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 11-23-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,453
Code:
sed -n '/^$/!{s/<[^>]*>//g;p;}' filename
Or, with a bit different output:

Code:
lynx --dump filename
(the file must have htm[l] extension)

Or use html2text

Last edited by radoulov; 11-23-2007 at 11:17 AM..
Reply With Quote
  #3 (permalink)  
Old 11-23-2007
btech_raju
Guest
 

Posts: n/a
Bits: 0 [Banking]
All the commands are doing good,

sed -n '/^$/!{s/<[^>]*>//g;p;}' filename

Please explain the above sed command

Thanks,
Thangaraju.

Last edited by btech_raju; 11-23-2007 at 11:39 AM..
Reply With Quote
Google The UNIX and Linux Forums
Reply

Bookmarks

Tags
None

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:




All times are GMT -4. The time now is 03:53 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66