Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Google Earth Monks

by petdance (Parson)
on Jul 03, 2006 at 06:56 UTC ( #558944=note: print w/ replies, xml ) Need Help??


in reply to Google Earth Monks

Haven't seen your code, but if you're extracting links from the monk page, take a look at using WWW::Mechanize and having it help you out on that. Should make your code simpler.

xoxo,
Andy


Comment on Re: Google Earth Monks
Replies are listed 'Best First'.
Re^2: Google Earth Monks
by McDarren (Abbot) on Jul 03, 2006 at 08:19 UTC
    Thanks :)

    Actually, I've never used WWW::Mechanize, so it didn't occur to me to try that. The routine I use for scraping the data from the Monk homenodes is given below. I think the main performance hit is the fact that I need to issue a separate request for each Monk. Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

    sub get_monk_stats { my $ref = shift; my $monk_url = 'http://www.perlmonks.org/?node_id='; my %monk_fields = ( 'User since:' => 1, 'Last here:' => 1, 'Experience:' => 1, 'Level:' => 1, 'Writeups:' => 1, ); MONK: foreach my $id (keys %{$ref}) { print "Getting data for $ref->{$id}{name} ($id)\n"; my $ua = LWP::UserAgent->new(); my $req = HTTP::Request->new(GET=>"$monk_url$id"); my $result = $ua->request($req); next MONK if !$result->is_success; my $content = $result->content; my $p = HTML::TokeParser->new(\$content); while (my $tag = $p->get_tag("td")) { my $text = $p->get_trimmed_text("/td"); if ($monk_fields{$text}) { $p->get_tag("td"); $ref->{$id}{$text} = $p->get_trimmed_text("/td"); } } } return $ref; }
      Ideally, it would be good to be able to grab all this information in a single go. But I'm not aware of any way that this is currently possible.

      You can work in parallel using POE::Component::Client::HTTP. Check it out.

      --
      David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558944]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2015-07-31 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls