Re: problem HTML::FormatText::WithLinks::AndTables

in reply to problem HTML::FormatText::WithLinks::AndTables

I have a feeling that HTML::FormatText::WithLinks::AndTables is not the right module for this task. What you're doing is converting HTML to plain text and then trying to parse that plain text. It would be easier to parse the original HTML. It's like taking cheese, tomato and pepperoni, assembling them into a sandwich, then disassembling the sandwich to make a pizza. Why make the sandwich when you want pizza?

What you want is an HTML parsing module. I'll give you an example using HTML::HTML5::Parser, because I wrote it. There are plenty of other HTML parsers on CPAN though.

use strict;
use warnings;

use HTML::HTML5::Parser;
use XML::LibXML::QuerySelector;
use Data::Dumper;

my $url = "http://www.pro-football-reference.com/boxscores/198509080ra
+m.htm";

my %data = HTML::HTML5::Parser
    -> load_html(location => $url)
    -> querySelectorAll('table#game_info tr')     # get all rows from 
+game_info table
    -> grep(sub { not $_->{class} eq 'thead' })   # ignore class="thea
+d" row
    -> map(sub {                                  # map each row into 
+a key, value pair
        my ($key, $value) = $_->querySelectorAll('td');
        return $key->textContent => $value->textContent;
    });

print Dumper \%data;
[download]

This outputs...

$VAR1 = {
          'Start Time' => '1:00pm',
          'Over/Under' => '38.0 (under)',
          'Surface' => 'grass',
          'Vegas Line' => 'Pick',
          'Weather' => '69 degrees, relative humidity 62%, wind 12 mph
+',
          'Stadium' => 'Anaheim Stadium'
        };
[download]

package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

In Section Seekers of Perl Wisdom