The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #1 (permalink)  
Old 11-21-2008
macxcool macxcool is offline
Registered User
  
 

Join Date: Nov 2008
Location: Canada
Posts: 4
Post Turn HTML data into delimited text

I have a file I've already partially pruned with grep that has data like:

<a href="MasterDetailResults.asp?textfield=a&Application=3D Home Architect 4">3D Home Architect 4</a> </td>
Approved </td>
--
<a href="MasterDetailResults.asp?textfield=a&Application=3d Home Architect 6">3d Home Architect 6</a> </td>
Not Approved </td>
--
<a href="MasterDetailResults.asp?textfield=a&Application=A to Zap">A to Zap</a> </td>
Approved </td>
--

except much, much more of it ;-)

I want to get the application name (i.e. 3D Home Architect 4) and the status (i.e. Approved or Not Approved) and turn it into this:

3D Home Architect 4|Approved
3d Home Architect 6|Not Approved
A to Zap|Approved
etc.

for use as a searchable database or import into Excel

I want to use bash scripting with sed or gawk to do this in the smallest number of lines (number of lines is not critical, of course ;-)

Thanks in advance for your help.