Quote:
Originally Posted by
lxdorney
Is there any out there have a brilliant idea on how to export html table data as .csv or write to txt file with separated comma and also get the filename of link from every table and put one line per rows each table.
Alas, there is indeed a "brilliant idea", but you probably are not going to like it: write a parser!
The solution you found (and which is similar to many others, including a few of my own) will work the way it is supposed to as long as the HTML source you feed it is "well-behaved". Well-behaved in this context means: it shall not contain constructs the creator of said solution did not think about in advance. If it does, the "solution" will perhaps break in one or the other way.
The reason is that "parsing" cannot be done with regular expressions, however cleverly arranged. "parsing" is a
recursive process and with anything short of a
recursive parser you might get
somewhere near a solution, but not a solution in the full meaning of the word. If you are interested in why:
here is it in length.
So, if you can live with some shortcomings like the chance that the "solution" you end up with will not always work, you can use what you found. If you need a real solution: i suggest the "Dragon Book" ("Principles of Compiler Design"; Aho, Sethi, Ullmann) as the best reference for building parsers, lexical analysers and similar programs.
I hope this helps.
bakunin