note
McDarren
Thanks :)
<p>Actually, I've never used [cpan://WWW::Mechanize], so it didn't occur to me to try that. The routine I use for scraping the data from the Monk homenodes is given below. I think the main performance hit is the fact that I need to issue a separate request for each Monk. Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.
<code>
sub get_monk_stats {
my $ref = shift;
my $monk_url = 'http://www.perlmonks.org/?node_id=';
my %monk_fields = (
'User since:' => 1,
'Last here:' => 1,
'Experience:' => 1,
'Level:' => 1,
'Writeups:' => 1,
);
MONK:
foreach my $id (keys %{$ref}) {
print "Getting data for $ref->{$id}{name} ($id)\n";
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET=>"$monk_url$id");
my $result = $ua->request($req);
next MONK if !$result->is_success;
my $content = $result->content;
my $p = HTML::TokeParser->new(\$content);
while (my $tag = $p->get_tag("td")) {
my $text = $p->get_trimmed_text("/td");
if ($monk_fields{$text}) {
$p->get_tag("td");
$ref->{$id}{$text} = $p->get_trimmed_text("/td");
}
}
}
return $ref;
}
</code>
558846
558944