Re: Lotto table extraction

in reply to Lotto table extraction

Those are some pretty nasty tables (50 or so of them). All sorts of empty rows and cells embedded throughout. In cases like these, it's better to extract all tables and filter based on inspecting particular cells. For example:

#!/usr/bin/perl

use strict;
use warnings;

use LWP::Simple;
use HTML::TableExtract;

my $data = get('http://www.flalottery.com/exptkt/c3.htm');

my $te = HTML::TableExtract->new;
$te->parse($data);

for my $t ($te->tables) {
  my $rc = -1;
  my($d, $c) = $t->coords;
  for my $r ($t->rows) {
    ++$rc;
    @$r = map  { s/^[^a-z0-9]//i; $_ }
          grep { /[a-z0-9]/i  }
          grep { defined $_   }
          @$r;
    next unless @$r && $r->[0] =~ m/^\d+\/\d+\/\d+$/;
    print "row $d:$c:$rc: ", join(':', @$r), "\n";
  }
}
[download]

The grep/grep/map part eliminates empty cells and gets rid of the   entities that precede the M/E indicators. The 'next' statement afterwards eliminates empty rows and non-dated rows. This is a shotgun approach. You could easily filter each row using specific column indexes, for example.

In Section Seekers of Perl Wisdom