Sponsored Content
Top Forums Shell Programming and Scripting extract complex data from html table rows Post 302628819 by birei on Tuesday 24th of April 2012 04:11:56 AM
Old 04-24-2012
Hi rickgtx,

I know Chubler_XL already solved the problem, but here other solution using perl and a HTML parser.
Code:
$ cat script.pl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $url = q[http://heavens-above.com/iridium.asp?Dur=7&lat=30&lng=10];

my $tree = HTML::TreeBuilder->new_from_content(
        get( $url ),
);

my @table_data = $tree->look_down(
        q[_tag] => q[tr],
        sub { 
                my $parent = $_[0]->parent();
                return 1 if
                        defined $parent->attr( q[cellpadding] )
                        && $parent->attr( q[cellpadding] ) =~ m/\A\d+\Z/;
        },
);

die qq[Data not found\n] unless @table_data;

for my $row ( 1 .. $#table_data ) {
        my @data;
        for ( $table_data[ $row ]->descendants ) {
                if ( $_->tag eq q[a] ) {
                        my @href;
                        if ( $_->attr( q[href] ) =~ m/\A(?i:flaredetails)/ ) {
                                 @href = $_->attr( q[href] ) =~ m/(?i)(?|lat=(\d+)|lng=(\d+)|loc=([^&]+)|date=([^&]+))/g;
                        }
                        push @data, @href;
                }
                if ( $_->tag() eq q[td] ) {
                        push @data, $_->as_text;
                }
        }

        splice @data, -3, 2;
        for ( @data[-3, -2] ) {
                s/\D.*\Z//;
        }
        @data[1..5] = @data[2..5,1];
        printf qq[%s\n], join q[, ], @data;
}

$ perl script.pl
29 Apr, 30, 10, Unspecified, 41028.8169421883, 21:36:24, -8, 48, 98, Iridium 55
30 Apr, 30, 10, Unspecified, 41029.1509187511, 05:37:19, -2, 16, 22, Iridium 72
01 May, 30, 10, Unspecified, 41030.1468397363, 05:31:27, -5, 15, 22, Iridium 62

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

extract data from html tables

hi i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies

2. Shell Programming and Scripting

Converting html table data into multiple variables.

Hi, Basically what I am trying to do is the following. I have created a shell script to grab timetabling information from a website using curl then I crop out only the data I need which is a table based on the current date. It leaves me with a file that has the table I want plus a small amount... (2 Replies)
Discussion started by: domsmith
2 Replies

3. Shell Programming and Scripting

Shell script to extract rows from table

I have an Employee with EID, ENAME and ESTATUS as columns in SQL. I want to extract the status of an employee and update the details if the status is 'A'. Can anyone help in writing the shell script. (1 Reply)
Discussion started by: vkca
1 Replies

4. UNIX for Dummies Questions & Answers

Bash script to insert data into an html table

hi, I need to create a bash shell script which picks up data from a text file and in the output file puts it into an html made table. I have to use sed and awk utilties to do this the input text file will contain data in the format: job name para1 para2 para3 para4 para4 1 ... (1 Reply)
Discussion started by: intern123
1 Replies

5. Shell Programming and Scripting

connecting to table to extract multiple rows into file from unix script

I need to extract the data from oracle table and written the below code. But it is not working.There is some problem with the query and output is shown is No rows selected" . If I run the same query from sql developer there is my required output. And if I run the shell script with simple sql... (7 Replies)
Discussion started by: giridhar276
7 Replies

6. Shell Programming and Scripting

Creating html table from data in file

Hi. I need to create html table from file which contains data. No awk please :) In example, ->cat file num1 num2 num3 23 3 5 2 3 4 (between numbers and words single TAB). after running mycode i need to get (heading is the first line): <table>... (2 Replies)
Discussion started by: Manu1234567
2 Replies

7. Shell Programming and Scripting

Input data of a file from perl into HTML table

Hi , I need an help in perl scripting. I have an perl script written and i have an for loop in that ,where as it writes some data to a file and it has details like below. cat out.txt This is the first line this is the second line. .....Now, this file needs to be send in mail in HTML... (2 Replies)
Discussion started by: scott_cog
2 Replies

8. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

9. Linux

Parsing - export html table data as .csv file?

Hi all, Is there any out there have a brilliant idea on how to export html table data as .csv or write to txt file with separated comma and also get the filename of link from every table and put one line per rows each table. Please see the attached html and PNG of what it looks like. ... (7 Replies)
Discussion started by: lxdorney
7 Replies

10. UNIX for Beginners Questions & Answers

Merge cells in all rows of a HTML table dynamically.

Hello All, I have visited many pages in Unix.com and could find out one solution for merging the HTML cells in the 1st row. (Unable to post the complete URL as I should not as per website rules). But, however I try, I couldn't achieve this merging to happen for all other rows of HTML... (17 Replies)
Discussion started by: Mounika
17 Replies
Template::Plugin::Cycle(3pm)				User Contributed Perl Documentation			      Template::Plugin::Cycle(3pm)

NAME
Template::Plugin::Cycle - Cyclically insert into a Template from a sequence of values SYNOPSIS
[% USE cycle('row', 'altrow') %] <table border="1"> <tr class="[% class %]"> <td>First row</td> </tr> <tr class="[% class %]"> <td>Second row</td> </tr> <tr class="[% class %]"> <td>Third row</td> </tr> </table> ################################################################### # Alternatively, you might want to make it available to all templates # throughout an entire application. use Template::Plugin::Cycle; # Create a Cycle object and set some values my $Cycle = Template::Plugin::Cycle->new; $Cycle->init('normalrow', 'alternaterow'); # Bind the Cycle object into the Template $Template->process( 'tablepage.html', class => $Cycle ); ####################################################### # Later that night in a Template <table border="1"> <tr class="[% class %]"> <td>First row</td> </tr> <tr class="[% class %]"> <td>Second row</td> </tr> <tr class="[% class %]"> <td>Third row</td> </tr> </table> [% class.reset %] <table border="1"> <tr class="[% class %]"> <td>Another first row</td> </tr> </table> ####################################################### # Which of course produces <table border="1"> <tr class="normalrow"> <td>First row</td> </tr> <tr class="alternaterow"> <td>Second row</td> </tr> <tr class="normalrow"> <td>Third row</td> </tr> </table> <table border="1"> <tr class="normalrow"> <td>Another first row</td> </tr> </table> DESCRIPTION
Sometimes, apparently almost exclusively when doing alternating table row backgrounds, you need to print an alternating, cycling, set of values into a template. Template::Plugin::Cycle is a small, simple, and hopefully DWIM solution to these sorts of tasks. It can be used either as a normal Template::Plugin, or can be created directly and passed in as a template argument, so that you can set up situations where it is implicitly available in every page. METHODS
new [ $Context ] [, @list ] The "new" constructor creates and returns a new "Template::Plugin::Cycle" object. It can be optionally passed an initial set of values to cycle through. When called from within a Template, the new constructor will be passed the current Template::Context as the first argument. This will be ignored. By doing this, you can use it both directly, AND from inside a Template. init @list If you need to set the values for a new empty object, of change the values to cycle through for an existing object, they can be passed to the "init" method. The method always returns the '' null string, to avoid inserting anything into the template. elements The "elements" method returns the number of items currently set for the "Template::Plugin::Cycle" object. list The "list" method returns the current list of values for the "Template::Plugin::Cycle" object. This is also the prefered method for getting access to a value at a particular position within the list of items being cycled to. [%# Access a variety of things from the list %] The first item in the Cycle object is [% cycle.list.first %]. The second item in the Cycle object is [% cycle.list.[1] %]. The last item in the Cycle object is [% cycle.list.last %]. next The "next" method returns the next value from the Cycle. If the end of the list of valuese is reached, it will "cycle" back the first object again. This method is also the one called when the object is stringified. That is, when it appears on its own in a template. Thus, you can do something like the following. <!-- An example of alternate row classes in a table--> <table border="1"> <!-- Explicitly access the next class in the cycle --> <tr class="[% rowclass.next %]"> <td>First row</td> </tr> <!-- This has the same effect --> <tr class="[% rowclass %]"> <td>Second row</td> </tr> </table> value The "value" method is an analogy for the "next" method. reset If a single "Template::Plugin::Cycle" object is to be used it multiple places within a template, and it is important that the same value be first every time, then the "reset" method can be used. The "reset" method resets the Cycle, so that the next value returned will be the first value in the Cycle object. SUPPORT
Bugs should be submitted via the CPAN bug tracker, located at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Template-Plugin-Cycle <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Template-Plugin-Cycle> For other issues, or commercial enhancement or support, contact the author.. AUTHOR
Adam Kennedy <adamk@cpan.org> Thank you to Phase N Australia (http://phase-n.com/ <http://phase-n.com/>) for permitting the open sourcing and release of this distribution as a spin-off from a commercial project. COPYRIGHT
Copyright 2004 - 2008 Adam Kennedy. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. perl v5.12.4 2008-06-19 Template::Plugin::Cycle(3pm)
All times are GMT -4. The time now is 02:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy