Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Google Earth Monks

by petdance (Parson)
on Jul 03, 2006 at 06:56 UTC ( #558944=note: print w/ replies, xml ) Need Help??


in reply to Google Earth Monks

Haven't seen your code, but if you're extracting links from the monk page, take a look at using WWW::Mechanize and having it help you out on that. Should make your code simpler.

xoxo,
Andy


Comment on Re: Google Earth Monks
Re^2: Google Earth Monks
by McDarren (Abbot) on Jul 03, 2006 at 08:19 UTC
    Thanks :)

    Actually, I've never used WWW::Mechanize, so it didn't occur to me to try that. The routine I use for scraping the data from the Monk homenodes is given below. I think the main performance hit is the fact that I need to issue a separate request for each Monk. Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

    sub get_monk_stats { my $ref = shift; my $monk_url = 'http://www.perlmonks.org/?node_id='; my %monk_fields = ( 'User since:' => 1, 'Last here:' => 1, 'Experience:' => 1, 'Level:' => 1, 'Writeups:' => 1, ); MONK: foreach my $id (keys %{$ref}) { print "Getting data for $ref->{$id}{name} ($id)\n"; my $ua = LWP::UserAgent->new(); my $req = HTTP::Request->new(GET=>"$monk_url$id"); my $result = $ua->request($req); next MONK if !$result->is_success; my $content = $result->content; my $p = HTML::TokeParser->new(\$content); while (my $tag = $p->get_tag("td")) { my $text = $p->get_trimmed_text("/td"); if ($monk_fields{$text}) { $p->get_tag("td"); $ref->{$id}{$text} = $p->get_trimmed_text("/td"); } } } return $ref; }
      Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

      You can work in parallel using POE::Component::Client::HTTP. Check it out.

      --
      David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558944]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2014-09-02 17:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (25 votes), past polls