extract complex data from html table rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract complex data from html table rows
# 1  
Old 04-23-2012
extract complex data from html table rows

I have bash, awk, and sed available on my portable device. I need to extract 10 fields from each table row from a web page that looks like this:
Code:
</tr>

    <tr>
    <td>28 Apr</td>
    <td><a href="flaredetails.asp?SatID=24907&lat=30&lng=10&alt=0&loc=Home&TZ=CST&Date=41027.779556738&Mirror=2">13:42:34</a></td>
    <td align=center>-5</td>
    <td align=right>59°</td>
    <td align=right>143° (SE )</td>
    
        <td align=right>7.3 km (E)</td>
        <td align=center>-8</td>
    
    <td align=left><a href="satinfo.aspx?SatID=24907&lat=30&lng=10&alt=0&loc=Home&TZ=CST">Iridium 22</a></td>
    </tr>
    
    <tr>. . .

To pull out this data:
Code:
28 Apr, 30, 10, Home, 41027.77955673, 13:42:34, -5, 59, 143, Iridium 22

This is a template of the row:
Code:
<tr><td>$date</td><td><a href="https://www.unix.com/shell-programming-scripting/flaredetails.asp?SatID=.....&lat=$lat&lng=$lng&alt=0&loc=$loc&TZ=CST&Date=$jdate&Mirror=2">$time</a></td><td align=center>$mag</td><td align=right>$alt°</td><td align=right>$azm°...</td><td align=right>...</td><td align=center>...</td><td align=left><a href="https://www.unix.com/shell-programming-scripting/...">$irsat</a></td></tr>

Where the $vars will be used as follows:
Code:
Description=$irsat $mag $date $time
Locationname=$loc
Longitude=$lng
Latitude=$lat
JulianDate=$jdate
DisplayCenterLat=$alt
DisplayCenterLong=$azm

Each row is a separate file

Any thoughts?? If I could only a few of them it would be a great start.

Thanks
Webpage is heavens-above.com/iridium.asp?Dur=7&lat=30&lng=10

Last edited by Scrutinizer; 04-23-2012 at 03:52 PM.. Reason: Added code tags
# 2  
Old 04-23-2012
Try this:

Code:
awk -F'[=&><°]' '
/[ \t]*<td>[^<]*<\/td>/&&!date {date=$3}
/flaredetails.asp/ { lat=$8 ; lng=$10 ; loc=$14 ; jdate=$18 ; time=$21 }
/center/&&time&&!mag{mag=$4}
/right/&&alt&&!azm{azm=$4}
/right/&&mag&&!alt{alt=$4}
/satinfo.aspx/ {
  print date, lat, lng, loc, jdate, time, mag, alt, azm, $18
  date=mag=azm=alt=x;
}' OFS=", " infile


Last edited by Chubler_XL; 04-23-2012 at 09:17 PM..
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 04-23-2012
Thank you so much! This would have taken me weeks to learn to do.

The 1st date does not show and I need to strip the degree symbol, &_#176;, and compass directions (NNE)
heavens-above.com/iridium.asp?Dur=7&lat=32.4414&lng=-98.638&loc=Home&alt=500&tz=CST
gives
Code:
Home (, 32.4414, -98.638, Home, 41023.4300358056, 05:19:15, -1, 52�, 231� (SW ), Iridium 82
28 Apr, 32.4414, -98.638, Home, 41027.4616683787, 06:04:48, -6, 23�, 22� (NNE), Iridium 14
28 Apr, 32.4414, -98.638, Home, 41028.1191551022, 21:51:35, -1, 49�, 97� (E  ), Iridium 55

I have much more respect for awk!

Last edited by rickgtx; 04-23-2012 at 10:49 PM.. Reason: degree symbol resolved
# 4  
Old 04-23-2012
Give this update a go:

Code:
awk -F'[=&><\302]' '
/[ \t]*<td>[^<]*<\/td>/&&!time{date=$3}
/flaredetails.asp/ { lat=$8 ; lng=$10 ; loc=$14 ; jdate=$18 ; time=$21 }
/center/&&time&&!mag{mag=$4}
/right/&&alt&&!azm{azm=$4}
/right/&&mag&&!alt{alt=$4}
/satinfo.aspx/ {
  print date, lat, lng, loc, jdate, time, mag, alt, azm, $18
  date=time=mag=azm=alt=x;
}' OFS=", " infile2

This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 04-23-2012
Perfect!!!!
# 6  
Old 04-24-2012
Hi rickgtx,

I know Chubler_XL already solved the problem, but here other solution using perl and a HTML parser.
Code:
$ cat script.pl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $url = q[http://heavens-above.com/iridium.asp?Dur=7&lat=30&lng=10];

my $tree = HTML::TreeBuilder->new_from_content(
        get( $url ),
);

my @table_data = $tree->look_down(
        q[_tag] => q[tr],
        sub { 
                my $parent = $_[0]->parent();
                return 1 if
                        defined $parent->attr( q[cellpadding] )
                        && $parent->attr( q[cellpadding] ) =~ m/\A\d+\Z/;
        },
);

die qq[Data not found\n] unless @table_data;

for my $row ( 1 .. $#table_data ) {
        my @data;
        for ( $table_data[ $row ]->descendants ) {
                if ( $_->tag eq q[a] ) {
                        my @href;
                        if ( $_->attr( q[href] ) =~ m/\A(?i:flaredetails)/ ) {
                                 @href = $_->attr( q[href] ) =~ m/(?i)(?|lat=(\d+)|lng=(\d+)|loc=([^&]+)|date=([^&]+))/g;
                        }
                        push @data, @href;
                }
                if ( $_->tag() eq q[td] ) {
                        push @data, $_->as_text;
                }
        }

        splice @data, -3, 2;
        for ( @data[-3, -2] ) {
                s/\D.*\Z//;
        }
        @data[1..5] = @data[2..5,1];
        printf qq[%s\n], join q[, ], @data;
}

$ perl script.pl
29 Apr, 30, 10, Unspecified, 41028.8169421883, 21:36:24, -8, 48, 98, Iridium 55
30 Apr, 30, 10, Unspecified, 41029.1509187511, 05:37:19, -2, 16, 22, Iridium 72
01 May, 30, 10, Unspecified, 41030.1468397363, 05:31:27, -5, 15, 22, Iridium 62

# 7  
Old 04-24-2012
@birei
Thank you, but I cannot get perl to run on my iPhone.

Last edited by rickgtx; 04-24-2012 at 10:33 PM.. Reason: Disreguard shell question - got answer
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge cells in all rows of a HTML table dynamically.

Hello All, I have visited many pages in Unix.com and could find out one solution for merging the HTML cells in the 1st row. (Unable to post the complete URL as I should not as per website rules). But, however I try, I couldn't achieve this merging to happen for all other rows of HTML... (17 Replies)
Discussion started by: Mounika
17 Replies

2. Linux

Parsing - export html table data as .csv file?

Hi all, Is there any out there have a brilliant idea on how to export html table data as .csv or write to txt file with separated comma and also get the filename of link from every table and put one line per rows each table. Please see the attached html and PNG of what it looks like. ... (7 Replies)
Discussion started by: lxdorney
7 Replies

3. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

4. Shell Programming and Scripting

Input data of a file from perl into HTML table

Hi , I need an help in perl scripting. I have an perl script written and i have an for loop in that ,where as it writes some data to a file and it has details like below. cat out.txt This is the first line this is the second line. .....Now, this file needs to be send in mail in HTML... (2 Replies)
Discussion started by: scott_cog
2 Replies

5. Shell Programming and Scripting

Creating html table from data in file

Hi. I need to create html table from file which contains data. No awk please :) In example, ->cat file num1 num2 num3 23 3 5 2 3 4 (between numbers and words single TAB). after running mycode i need to get (heading is the first line): <table>... (2 Replies)
Discussion started by: Manu1234567
2 Replies

6. Shell Programming and Scripting

connecting to table to extract multiple rows into file from unix script

I need to extract the data from oracle table and written the below code. But it is not working.There is some problem with the query and output is shown is No rows selected" . If I run the same query from sql developer there is my required output. And if I run the shell script with simple sql... (7 Replies)
Discussion started by: giridhar276
7 Replies

7. UNIX for Dummies Questions & Answers

Bash script to insert data into an html table

hi, I need to create a bash shell script which picks up data from a text file and in the output file puts it into an html made table. I have to use sed and awk utilties to do this the input text file will contain data in the format: job name para1 para2 para3 para4 para4 1 ... (1 Reply)
Discussion started by: intern123
1 Replies

8. Shell Programming and Scripting

Shell script to extract rows from table

I have an Employee with EID, ENAME and ESTATUS as columns in SQL. I want to extract the status of an employee and update the details if the status is 'A'. Can anyone help in writing the shell script. (1 Reply)
Discussion started by: vkca
1 Replies

9. Shell Programming and Scripting

Converting html table data into multiple variables.

Hi, Basically what I am trying to do is the following. I have created a shell script to grab timetabling information from a website using curl then I crop out only the data I need which is a table based on the current date. It leaves me with a file that has the table I want plus a small amount... (2 Replies)
Discussion started by: domsmith
2 Replies

10. UNIX for Dummies Questions & Answers

extract data from html tables

hi i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies
Login or Register to Ask a Question