This doesn't seem to be getting very far very fast. The following puts all the pieces together, albeit using data from Re^2: parsing CSV rather than the real data. The parsing and clean up will no doubt need to be different for the real data. This just pulls out the first two pre tags from one page rather than fetching two pages and doing whatever is needed to pull out the interesting content.
use strict;
use warnings;
use MIME::Lite;
use LWP::Simple;
use Text::CSV;
use HTML::TreeBuilder;
# Fetch the "pages"
my $content = get("http://perlmonks.org/?node_id=1173447");
die "Couldn't get it!" unless defined $content;
# Parse pages and clean up content
my $root = HTML::TreeBuilder->new_from_content($content);
my ($page1, $page2) = map {$_->as_text()} $root->find_by_tag_name('pre
+');
s/\[download\]//g for $page1, $page2;
s/\n\+//g for $page1, $page2;
# Process page 1
my $csv = Text::CSV->new();
my %idData;
open my $pg1In, '<', \$page1;
while (my $row = $csv->getline($pg1In)) {
s/^\s+|\s+$//g for @$row;
$idData{$row->[1]}{size} = $row->[0];
$idData{$row->[1]}{name} = '-- missing --';
}
close $pg1In;
# Process page 2
$page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes
open my $pg2In, '<', \$page2;
while (my $row = $csv->getline($pg2In)) {
next if !$row->[0]; # Skip blank lines
s/^\s+|\s+$//g for @$row;
$idData{$row->[0]}{name} = $row->[1];
$idData{$row->[0]}{size} //= '-- missing --';
}
close $pg2In;
# Generate output string
my $output;
for my $id (sort keys %idData) {
$output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n";
}
# Build the email
my $msg = MIME::Lite->new(
From => 'me@myhost.com',
To => 'you@yourhost.com',
Cc => 'some@other.com, some@more.com',
Subject => "Here's the data you wanted",
Data => $output
);
# and "send" it (just '$msg->send()' in the next line to really send i
+t
print $msg->as_string();
Prints:
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.030 (F2.85; T2.13; A2.16; B3.15; Q3.13)
Date: Sun, 9 Oct 2016 22:55:39 +1300
From: me@myhost.com
To: you@yourhost.com
Cc: some@other.com, some@more.com
Subject: Here's the data you wanted
c100: Joe Shmo size 512.45
c200: Jack Black size 6734
c300: Cinderella size 5653.2
c400: Barack Obama size -- missing --
c500: Cruella Deville size -- missing --
I suggest you leave the print line in until the body of the email looks right before you change it to the send line.
Premature optimization is the root of all job security