thanks bakunin that is really helpful. i cant post a sample of the html page for various reasons. the only problem with your solution is that most of the <tr> tags are across multiple lines in my html page. ie the tag may be opened on line 7 and then closed on line 20. hence is it possible with
sed to delete everything on a line (including the line) BUT stop when it gets to a <tr> tag and start again when it gets to a </tr>? alternatively is there a way to make
sed believe that the whole html page is on a single line?
as i am not familiar with the capabilities of
sed, it makes it hard for me to know what the best way of completing this task is.