Sponsored Content
Full Discussion: HTML to CSV
Top Forums UNIX for Dummies Questions & Answers HTML to CSV Post 302905729 by cjcox on Friday 13th of June 2014 11:15:17 AM
Old 06-13-2014
This needs some work, but might be an ok starting point... something I wrote for something else a couple of years ago. Written in php:

Code:
<?php
/** \action-csv.php
 * \copyright (c) 2012, Christopher Jay Cox, Licensed under GPLv2
 * \author Christopher Jay Cox
 * \version 20120318a
*/

function Html2CSV($page_html) {
        # This also preserves hyperlinks, we'll parse the targets and names later.
        $page_html = strip_tags($page_html, '<a><table><tr><th><td>');

        preg_match_all('/<tr[^>]*>(.*)<\/tr>/isU', $page_html, $trs);

        $ahrefexp = '/<a \s*[^>]*href=["\'](?P<href>[^"\']*)["\']\s*[^>]*>(?P<name>.*)<\s*\/a>/si';
        $csvout='';
        foreach ($trs[1] as $tr) {
                preg_match_all('/<t[hd][^>]*>(.*)<\/t[hd]>/isU', $tr, $tds);
                $first = true;
                foreach ($tds[1] as $td) {
                        # For CSV output, prefer blank to regular decode for nbsp
                        $td = preg_replace('/&nbsp;/', ' ', $td);
                        $td = preg_replace('/&quot;/', '"', $td);
                        # Double quotes must be escaped by another double quote
                        $td = preg_replace('/"/', '""', $td);
                        if (!$first) $csvout .= ',';
                                if (preg_match_all($ahrefexp, $td, $matches))
                                        $td = $matches['name'][0];
                                $csvout .= '"' . html_entity_decode($td, ENT_COMPAT, 'UTF-8') . '"';
                                $first = false;
                }
                $csvout .= "\n";
        }
        return $csvout;
}

$page_html = file_get_contents('/tmp/webpage.html');
$csvout=Html2CSV($page_html);
echo "$csvout"
?>

 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Converting HTML to CSV

Hi, I need to convert a relatively large html file (1.5megs) into CSV under Unix. How would I be able to do this? Much thanks. (3 Replies)
Discussion started by: Jexel
3 Replies

2. Shell Programming and Scripting

HTML table to CSV

Hi !! I have HTML Tables through which i want to generate graphs, but for creating graphs i need the file in CSV format so can anyone can please help me in how can i convert my HTML table file to CSV format. Thanks in Advance (2 Replies)
Discussion started by: i_priyank
2 Replies

3. Shell Programming and Scripting

HTML to csv

Hi !! Could you please let me know of how can a html file be converted to csv.. I am looking out for a script which could do that.. Please find the below example <HTML><BODY><TABLE> <TR><TD>Parent CR</TD><TD>ChildCR</TD><TD>Title</TD><TD>Description</TD></TR> </TABLE></BODY></HTML>... (3 Replies)
Discussion started by: ganga.dharan
3 Replies

4. Shell Programming and Scripting

Parsing: How to go from HTML to CSV?

Dear all, I have to parse a large amount of html files, which I would like to transform into comma separated values. The html-files have the following structure: <tag1> CATEGORY_1 <tag2><tag3> HEADER_1 <tag4> <tag5> paragraph_1 <tag6> <tag5> paragraph_2 <tag6> <tag3>HEADER_2... (2 Replies)
Discussion started by: docdudetheman
2 Replies

5. UNIX for Dummies Questions & Answers

convert csv to html file

Hi All, I am new to this forum,not sure where to post this query...so posted here Kindly need any of your help on the below ------------ I am using shell scripting and trying to convert a csv file to html file... example.csv --------------- Name Country Age Sex Andy India 25 ... (4 Replies)
Discussion started by: sumithra
4 Replies

6. Shell Programming and Scripting

html to csv conversion

thanks for allowing me to join your forum i have a html file with three columns ------------Last visit date , URL and link,,,,,,,, how can i convert the same into csv so that i can output into database... the mechine is linux...i made a little googling and got idea that there is ways for... (2 Replies)
Discussion started by: certteam
2 Replies

7. Shell Programming and Scripting

Help needed in csv to html

Hi, Below is the code i have. But it prints entire csv line in one column. I want to print 10 comma-separated fields in 10 columns. Almost there, maybe a tweak you guys can help with. cat reports/file.csv |awk -v border=1 -v width=10 -v bgcolor=black -v f gcolor=white ' BEGIN {... (1 Reply)
Discussion started by: jakSun8
1 Replies

8. Shell Programming and Scripting

html-to-csv

Dear, I have to format an output that is html with the tags outside the standard for a csv file. follows the input file: <table id=tabela BORDER=1 CELLSPACING=0 CELLPADDING=0 slcolor=#ffffcc dragcolor='gray' img='false' col='1' rowTotal='1' height=100% habilita_primeira='1'... (2 Replies)
Discussion started by: He2
2 Replies

9. Shell Programming and Scripting

Converting csv to html format

Below is the code I have - How can I convert the data in the csv into 3 tables in html. instead of 1 table. Attached is the format I am getting. (1 Reply)
Discussion started by: archana25
1 Replies
Duration(3pm)						User Contributed Perl Documentation					     Duration(3pm)

NAME
Time::Duration - rounded or exact English expression of durations SYNOPSIS
Example use in a program that ends by noting its runtime: my $start_time = time(); use Time::Duration; # then things that take all that time, and then ends: print "Runtime ", duration(time() - $start_time), ". "; Example use in a program that reports age of a file: use Time::Duration; my $file = 'that_file'; my $age = $^T - (stat($file))[9]; # 9 = modtime print "$file was modified ", ago($age); DESCRIPTION
This module provides functions for expressing durations in rounded or exact terms. In the first example in the Synopsis, using duration($interval_seconds): If the "time() - $start_time" is 3 seconds, this prints "Runtime: 3 seconds.". If it's 0 seconds, it's "Runtime: 0 seconds.". If it's 1 second, it's "Runtime: 1 second.". If it's 125 seconds, you get "Runtime: 2 minutes and 5 seconds.". If it's 3820 seconds (which is exactly 1h, 3m, 40s), you get it rounded to fit within two expressed units: "Runtime: 1 hour and 4 minutes.". Using duration_exact instead would return "Runtime: 1 hour, 3 minutes, and 40 seconds". In the second example in the Synopsis, using ago($interval_seconds): If the $age is 3 seconds, this prints "file was modified 3 seconds ago". If it's 0 seconds, it's "file was modified just now", as a special case. If it's 1 second, it's "from 1 second ago". If it's 125 seconds, you get "file was modified 2 minutes and 5 seconds ago". If it's 3820 seconds (which is exactly 1h, 3m, 40s), you get it rounded to fit within two expressed units: "file was modified 1 hour and 4 minutes ago". Using ago_exact instead would return "file was modified 1 hour, 3 minutes, and 40 seconds ago". And if the file's modtime is, surprisingly, three seconds into the future, $age is -3, and you'll get the equally and appropriately surprising "file was modified 3 seconds from now." FUNCTIONS
This module provides all the following functions, which are all exported by default when you call "use Time::Duration;". duration($seconds) duration($seconds, $precision) Returns English text expressing the approximate time duration of abs($seconds), with at most "$precision || 2" expressed units. (That is, duration($seconds) is the same as duration($seconds,2).) For example, duration(120) or duration(-120) is "2 minutes". And duration(0) is "0 seconds". The precision figure means that no more than that many units will be used in expressing the time duration. For example, 31,629,659 seconds is a duration of exactly 1 year, 1 day, 2 hours, and 59 seconds (assuming 1 year = exactly 365 days, as we do assume in this module). However, if you wanted an approximation of this to at most two expressed (i.e., nonzero) units, it would round it and truncate it to "1 year and 1 day". Max of 3 expressed units would get you "1 year, 1 day, and 2 hours". Max of 4 expressed units would get you "1 year, 1 day, 2 hours, and 59 seconds", which happens to be exactly true. Max of 5 (or more) expressed units would get you the same, since there are only four nonzero units possible in for that duration. duration_exact($seconds) Same as duration($seconds), except that the returned value is an exact (unrounded) expression of $seconds. For example, duration_exact(31629659) returns "1 year, 1 day, 2 hours, and 59 seconds later", which is exactly true. ago($seconds) ago($seconds, $precision) For a positive value of seconds, this prints the same as "duration($seconds, [$precision]) . ' ago'". For example, ago(120) is "2 minutes ago". For a negative value of seconds, this prints the same as "duration($seconds, [$precision]) . ' from now'". For example, ago(-120) is "2 minutes from now". As a special case, ago(0) returns "right now". ago_exact($seconds) Same as ago($seconds), except that the returned value is an exact (unrounded) expression of $seconds. from_now($seconds) from_now($seconds, $precision) from_now_exact($seconds) The same as ago(-$seconds), ago(-$seconds, $precision), ago_exact(-$seconds). For example, from_now(120) is "2 minutes from now". later($seconds) later($seconds, $precision) For a positive value of seconds, this prints the same as "duration($seconds, [$precision]) . ' later'". For example, ago(120) is "2 minutes later". For a negative value of seconds, this prints the same as "duration($seconds, [$precision]) . ' earlier'". For example, later(-120) is "2 minutes earlier". As a special case, later(0) returns "right then". later_exact($seconds) Same as later($seconds), except that the returned value is an exact (unrounded) expression of $seconds. earlier($seconds) earlier($seconds, $precision) earlier_exact($seconds) The same as later(-$seconds), later(-$seconds, $precision), later_exact(-$seconds). For example, earlier(120) is "2 minutes earlier". concise( function( ... ) ) Concise takes the string output of one of the above functions and makes it more concise. For example, "ago(4567)" returns "1 hour and 16 minutes ago", but "concise(ago(4567))" returns "1h16m ago". I18N/L10N NOTES Little of the internals of this module are English-specific. See source and/or contact me if you're interested in making a localized version for some other language than English. BACKSTORY
I wrote the basic "ago()" function for use in Infobot ("http://www.infobot.org"), because I was tired of this sort of response from the Purl Infobot: me> Purl, seen Woozle? <Purl> Woozle was last seen on #perl 20 days, 7 hours, 32 minutes and 40 seconds ago, saying: Wuzzle! I figured if it was 20 days ago, I don't care about the seconds. So once I had written "ago()", I abstracted the code a bit and got all the other functions. CAVEAT
This module calls a durational "year" an interval of exactly 365 days of exactly 24 hours each, with no provision for leap years or monkey business with 23/25 hour days (much less leap seconds!). But since the main work of this module is approximation, that shouldn't be a great problem for most purposes. SEE ALSO
Date::Interval, which is similarly named, but does something rather different. Star Trek: The Next Generation (1987-1994), where the character Data would express time durations like "1 year, 20 days, 22 hours, 59 minutes, and 35 seconds" instead of rounding to "1 year and 21 days". This is because no-one ever told him to use Time::Duration. COPYRIGHT AND DISCLAIMER
Copyright 2006, Sean M. Burke "sburke@cpan.org", all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. AUTHOR
Current maintainer Avi Finkel, "avi@finkel.org"; Original author Sean M. Burke, "sburke@cpan.org" perl v5.10.1 2007-08-19 Duration(3pm)
All times are GMT -4. The time now is 04:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy