HTML table to CSV Post: 302145826

Sponsored Content

Top Forums Shell Programming and Scripting HTML table to CSV Post 302145826 by drl on Thursday 15th of November 2007 10:29:48 AM

11-15-2007

Registered User

Hi.

If you have command lynx (a text-mode browser) installed, it does a good job of removing markup tags:

Code:

% cat s1
#!/usr/bin/env sh

# @(#) s1       Demonstrate lynx -dump to eliminate html tags.

set -o nounset
echo

debug=":"
debug="echo"

## Use local command version for the commands in this demonstration.

echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash lynx sed tr

echo

FILE=${1-data1.html}

echo " Input data:"
cat $FILE

echo
echo " Final results:"

lynx -dump $FILE |
tee t1 |
sed -e 's/^ *//' |
tr -s ' ' ','

echo
echo " Intermediate results from lynx:"
cat t1

exit 0

Producing:

Code:

% ./s1

(Versions displayed with local utility "version")
GNU bash 2.05b.0
Lynx Version 2.8.5rel.1 (04 Feb 2004)
GNU sed version 4.1.2
tr (coreutils) 5.2.1

 Input data:
<HTML>
<HEAD>
<TITLE>Table with numeric data</TITLE>
</HEAD>
<BODY>
<TABLE border="1">
  <TR> <TD>5</TD> <TD>4</TD>
 <TD>23</TD> </TR> <TR> <TD>10</TD> <TD>3</TD> <TD>24</TD> </TR>
  <TR> <TD>6</TD> <TD>12</TD> <TD>28</TD> </TR>
  <TR> <TD>17</TD> <TD>20</TD> <TD>32</TD> </TR>
</TABLE>
</BODY>
</HTML>

 Final results:

5,4,23
10,3,24
6,12,28
17,20,32

 Intermediate results from lynx:

   5  4  23
   10 3  24
   6  12 28
   17 20 32

The lynx -dump output needs only a bit of a massage to get it into CSV format. See man lynx for details ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Export a HTML table to Xcel

Hello All, I have a perl script that prints a HMTL table. I want to convert this data into a report and this want to export this information into Excel. How can I do this? Regards, garric

2. Shell Programming and Scripting

Hi All, I have an html table which looks like this: <table align="center" border="1"> <CAPTION><EM>Heading for Table</EM></CAPTION> <tr><td><b>1</b></TD><TD><b>2</b></TD><TD><b>3</b></TD><TD><b>4</b></TD><TD><b>TOTAL</b></TD><TD><b>DATE</b></td></tr> <tr><td>88088283</TD> <TD>87613101</TD>...

3. Shell Programming and Scripting

help with a bash script to create a html table

Hi guys as the title says i need a little help i have partisally written a bash script to create a table in html so if i use ./test 3,3 i get the following output for the third arguement in the script i wish to include content that will be replace the A characters in the...

4. Shell Programming and Scripting

Get HTML table

Hi all, I have a html that contains several tables in it. Need to extract the data from one of them named "orderList". Is it any easy way without using loops. Thanks

5. Shell Programming and Scripting

Itinerate throught HTML table

HI all, <html> <body> <div> <table id="orderList"> <thead> <tr> <th>order number</th> <th>order type</th> <th>product type</th> <th>status</th> <th>status date</th> </tr> </thead> <tbody> <tr class="odd"> ...

6. Web Development

Help on filtering the table in HTML

1. how to get the filter option on table so that user can enter the fields which ever they want to print only according to the need ? 2.how to print the full fledge table if there is no value in the rows of the table but it should print the whole rows and column in proper tabular form?

7. Shell Programming and Scripting

Creating html table from data in file

Hi. I need to create html table from file which contains data. No awk please :) In example, ->cat file num1 num2 num3 23 3 5 2 3 4 (between numbers and words single TAB). after running mycode i need to get (heading is the first line): <table>...

8. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to...

9. Linux

Parsing - export html table data as .csv file?

Hi all, Is there any out there have a brilliant idea on how to export html table data as .csv or write to txt file with separated comma and also get the filename of link from every table and put one line per rows each table. Please see the attached html and PNG of what it looks like. ...

10. UNIX for Beginners Questions & Answers

Export HTML table

HI , I have a HTML tables as below. It has 2 tables ,I want to extract the second table . Please help me in doing it. <html> <body> <b><br>Running Date: </b>11-JAN-2019 03:07</br> <h2> Schema mapping and info </h2> <BR><TABLE width="100%" class="x1h" cellpadding="1"...

LEARN ABOUT REDHAT

html::filter

HTML::Filter(3) 					User Contributed Perl Documentation					   HTML::Filter(3)

NAME

       HTML::Filter - Filter HTML text through the parser

NOTE

       This module is deprecated. "HTML::Parser" now provides the functionally of "HTML::Filter" much more efficiently with the the "default" han-
       dler.

SYNOPSIS

	require HTML::Filter;
	$p = HTML::Filter->new->parse_file("index.html");

DESCRIPTION

       "HTML::Filter" is an HTML parser that by default prints the original text of each HTML element (a slow version of cat(1) basically).  The
       callback methods may be overridden to modify the filtering for some HTML elements and you can override output() method which is called to
       print the HTML text.

       "HTML::Filter" is a subclass of "HTML::Parser". This means that the document should be given to the parser by calling the $p->parse() or
       $p->parse_file() methods.

EXAMPLES

       The first example is a filter that will remove all comments from an HTML file.  This is achieved by simply overriding the comment method to
       do nothing.

	 package CommentStripper;
	 require HTML::Filter;
	 @ISA=qw(HTML::Filter);
	 sub comment { }  # ignore comments

       The second example shows a filter that will remove any <TABLE>s found in the HTML file.	We specialize the start() and end() methods to
       count table tags and then make output not happen when inside a table.

	 package TableStripper;
	 require HTML::Filter;
	 @ISA=qw(HTML::Filter);
	 sub start
	 {
	    my $self = shift;
	    $self->{table_seen}++ if $_[0] eq "table";
	    $self->SUPER::start(@_);
	 }

	 sub end
	 {
	    my $self = shift;
	    $self->SUPER::end(@_);
	    $self->{table_seen}-- if $_[0] eq "table";
	 }

	 sub output
	 {
	     my $self = shift;
	     unless ($self->{table_seen}) {
		 $self->SUPER::output(@_);
	     }
	 }

       If you want to collect the parsed text internally you might want to do something like this:

	 package FilterIntoString;
	 require HTML::Filter;
	 @ISA=qw(HTML::Filter);
	 sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
	 sub filtered_html { join("", @{$_[0]->{fhtml}}) }

SEE ALSO

       HTML::Parser

COPYRIGHT

       Copyright 1997-1999 Gisle Aas.

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.8.0							    1999-12-09							   HTML::Filter(3)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Export a HTML table to Xcel

Discussion started by: garric

2. Shell Programming and Scripting

PHP: Sorting HTML table

Discussion started by: pondlife