![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| shell script to parse html file | sais | UNIX for Advanced & Expert Users | 1 | 08-26-2009 11:10 AM |
| parse long input parameters | larne | Shell Programming and Scripting | 3 | 12-16-2008 04:36 PM |
| How do I extract text only from html file without HTML tag | los111 | UNIX for Dummies Questions & Answers | 4 | 11-28-2007 04:40 AM |
| parse text file | craggm | Shell Programming and Scripting | 9 | 02-27-2007 02:13 AM |
| parse text file | klick81 | Shell Programming and Scripting | 3 | 12-18-2006 12:04 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Parse HTML tag parameters and text
Hi!
I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: HTML Code:
<TR> .... </TR> <TR> .... </TR> ... HTML Code:
<TR><A NAME="1,1"><TD CLASS="small" WIDTH="30" ALIGN="right" VALIGN="top">1,1</TD><TD WIDTH="380" ALIGN="left" VALIGN="top"> <FONT COLOR="black">Here is a text part</FONT></TD> <TD BGCOLOR="green" WIDTH="1px"></TD> <TD BGCOLOR="white" WIDTH="1px"></TD> <TD BGCOLOR="white" WIDTH="1px"></TD> <TD BGCOLOR="white" WIDTH="1px"></TD> <TD CLASS="small" ALIGN="left" VALIGN="top"> <A TARGET='index' CLASS='small' HREF='target.php?newtab=1&from=1,1&b=19&ch=121&v=2&SID=...'>Textlink1</A>; <A TARGET='index' CLASS='small' HREF='target.php?newtab=1&from=1,1&b=19&ch=146&v=6-8&SID=...'>Textlink2</A></TD> <TD BGCOLOR="white" WIDTH="1px"></TD><TD BGCOLOR="white" WIDTH="1px"></TD><TD CLASS="small" ALIGN="left" VALIGN="top"></TD></TR> <A NAME="1,1"> Here is a text part 1,1,19,121,2 1,1,19,146,6-8 name(1),name(2),between font tags,atarget1,atarget2...atargetN NUMBER,NUMBER,TEXTPART,LINK1,LINK2,...,LINKN where LINKi is like: from(1),from(2),b,ch,v The number of links can be none, or more. I don't know the maximum. Can you help me with extracting these infos? I can find these parts with regexp, but don't know how to put the info in parameters and how to it for every line.. And the number of links is unknown, but it's fine, I'll can parse the csv. Thx, Andras |
![]() |
| Bookmarks |
| Tags |
| html tag parse awk sed |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|