Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Google Earth Monks

by McDarren (Abbot)
on Jul 03, 2006 at 08:19 UTC ( #558953=note: print w/ replies, xml ) Need Help??


in reply to Re: Google Earth Monks
in thread Google Earth Monks

Thanks :)

Actually, I've never used WWW::Mechanize, so it didn't occur to me to try that. The routine I use for scraping the data from the Monk homenodes is given below. I think the main performance hit is the fact that I need to issue a separate request for each Monk. Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

sub get_monk_stats { my $ref = shift; my $monk_url = 'http://www.perlmonks.org/?node_id='; my %monk_fields = ( 'User since:' => 1, 'Last here:' => 1, 'Experience:' => 1, 'Level:' => 1, 'Writeups:' => 1, ); MONK: foreach my $id (keys %{$ref}) { print "Getting data for $ref->{$id}{name} ($id)\n"; my $ua = LWP::UserAgent->new(); my $req = HTTP::Request->new(GET=>"$monk_url$id"); my $result = $ua->request($req); next MONK if !$result->is_success; my $content = $result->content; my $p = HTML::TokeParser->new(\$content); while (my $tag = $p->get_tag("td")) { my $text = $p->get_trimmed_text("/td"); if ($monk_fields{$text}) { $p->get_tag("td"); $ref->{$id}{$text} = $p->get_trimmed_text("/td"); } } } return $ref; }


Comment on Re^2: Google Earth Monks
Download Code
Re^3: Google Earth Monks
by Hue-Bond (Priest) on Jul 03, 2006 at 15:21 UTC
    Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

    You can work in parallel using POE::Component::Client::HTTP. Check it out.

    --
    David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (17)
As of 2015-07-02 17:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (44 votes), past polls