To Break data out of HTML


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To Break data out of HTML
# 1  
Old 05-19-2008
To Break data out of HTML

I'm working with the output of an html form and trying to get it into CSV. The html is a table with many entries like the following.

HTML Code:
<tr><td nowrap><b><font size=3>NAME</font></b></td><td nowrap><b>License #  : </b>&nbsp;LICENSE</td></tr>
<tr><td><b>City : </b>&nbsp;CITY<td nowrap><b>Type  : </b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TYPE</td></tr>
<tr><td><b>State :</b>&nbsp;ST<td nowrap><b>Status  : </b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;STATUS</td></tr>
<tr><td><b>Phone :</b>&nbsp;PHONE<td nowrap><b>Expires: &nbsp;</b>&nbsp;&nbsp;&nbsp;&nbsp;EXPIRES</td></tr><td></td>
<td nowrap><b>Nat. Registry: </b>Y/N</td></tr><tr><td> <tr><td colspan=2><hr width='100%'></td></tr>
I'm looking for a way to turn that into
Code:
NAME, LICENSE, CITY, TYPE, ST, STATUS, PHONE, EXPIRES, Y/N

I was looking at sed, with \1, \2, etc, but it doesn't behave the way my understanding leads me to expect. My first thought was something like the following, but it seems way too fragile.

Code:
cat appr-test | sed 's_<tr><td nowrap><b><font size=3>\(.*\)</font>_\1'_

Is there a best way to go about this? Many thanks for ideas.
# 2  
Old 05-20-2008
If you really want a "best way", that would be a proper HTML parser.

Assuming you wish to stay with something lighter, like sed or awk, perhaps you can elaborate on what is wrong with the sed you have tried so far. (The <font> tag in your example script does not occur in the HTML sample you posted, but I guess that's beside the point here.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating html table from data in file

Hi. I need to create html table from file which contains data. No awk please :) In example, ->cat file num1 num2 num3 23 3 5 2 3 4 (between numbers and words single TAB). after running mycode i need to get (heading is the first line): <table>... (2 Replies)
Discussion started by: Manu1234567
2 Replies

2. Shell Programming and Scripting

Script to fetch data from HTML

Hi All, There is a link from were I usually search somthing and fetch the data from. Is there any way to automate it through a script if I mention search criteria in a note pad. I mean the script to search the content on the notepad and resutls should be placed into another file. ... (2 Replies)
Discussion started by: indradev
2 Replies

3. Shell Programming and Scripting

Using sed or awk to turn data into html

Hi there, I'm wondering the best way to go about this. I have a file which is fairly specific in its format, but it has some options in it that mess up what I need. I'll give you an example of a couple of lines: Bob D Thomas D/F Alice A/F Michael A/D/F John Michael B Bachman Turner A/D... (7 Replies)
Discussion started by: melancthon
7 Replies

4. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

5. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

6. Shell Programming and Scripting

Turn HTML data into delimited text

I have a file I've already partially pruned with grep that has data like: <a href="MasterDetailResults.asp?textfield=a&Application=3D Home Architect 4">3D Home Architect 4</a> </td> Approved </td> -- <a href="MasterDetailResults.asp?textfield=a&Application=3d Home... (6 Replies)
Discussion started by: macxcool
6 Replies

7. Shell Programming and Scripting

data break split

I am trying to figure out how to split a file when the data in the new line is different from the current line using a shell script? For eg.. if my input file contains the following 2341123 ABCAD 2341123 ANCAED 2341123 AVADV 3343434 ASDVAV 3343434 ASDFADF 4231232 ADACVAV 4231232... (3 Replies)
Discussion started by: gmatsoon
3 Replies

8. Shell Programming and Scripting

Converting HTML data into a spreadsheet

Hi, I have a perl script that prints some data in the form of a table (HTML table) Now, I want to be able to convert this data into a report on an Excel sheet. How can I do this? Regards, Garric (4 Replies)
Discussion started by: garric
4 Replies

9. UNIX for Dummies Questions & Answers

extract data from html tables

hi i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies

10. Programming

coverting html data to text in 'c'

hi, iam reading the webpage using curl socket. so iam geting the data in html format so how can convert html data to text data ,so i can move forward. thank u, sree (3 Replies)
Discussion started by: phani_sree
3 Replies
Login or Register to Ask a Question