Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Google Earth Monks

by McDarren (Abbot)
on Jul 03, 2006 at 08:19 UTC ( #558953=note: print w/ replies, xml ) Need Help??


in reply to Re: Google Earth Monks
in thread Google Earth Monks

Thanks :)

Actually, I've never used WWW::Mechanize, so it didn't occur to me to try that. The routine I use for scraping the data from the Monk homenodes is given below. I think the main performance hit is the fact that I need to issue a separate request for each Monk. Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

sub get_monk_stats { my $ref = shift; my $monk_url = 'http://www.perlmonks.org/?node_id='; my %monk_fields = ( 'User since:' => 1, 'Last here:' => 1, 'Experience:' => 1, 'Level:' => 1, 'Writeups:' => 1, ); MONK: foreach my $id (keys %{$ref}) { print "Getting data for $ref->{$id}{name} ($id)\n"; my $ua = LWP::UserAgent->new(); my $req = HTTP::Request->new(GET=>"$monk_url$id"); my $result = $ua->request($req); next MONK if !$result->is_success; my $content = $result->content; my $p = HTML::TokeParser->new(\$content); while (my $tag = $p->get_tag("td")) { my $text = $p->get_trimmed_text("/td"); if ($monk_fields{$text}) { $p->get_tag("td"); $ref->{$id}{$text} = $p->get_trimmed_text("/td"); } } } return $ref; }


Comment on Re^2: Google Earth Monks
Download Code
Re^3: Google Earth Monks
by Hue-Bond (Priest) on Jul 03, 2006 at 15:21 UTC
    Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

    You can work in parallel using POE::Component::Client::HTTP. Check it out.

    --
    David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2014-07-26 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (177 votes), past polls