Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
package WWW::Google; use strict; # Google.pm - amoe 20/01/2002 # hackish module to search google programmatically use LWP::UserAgent; use HTTP::Request; use HTML::TokeParser; use URI::Escape; # /me apologises in advance sub new { my $class = shift; my $self = bless {}, $class; my $agent_name = shift || "WWW-Google/0.1 ($^O; http://amoe.perlmonk.org/techno/perl/proj +ects/www_google/)"; my $agent = LWP::UserAgent->new; $agent->agent($agent_name); $self->{cgiloc} = ['http://www.google.com/', 'search']; $self->{place} = 0; $self->{agent} = $agent; while (my ($key, $value) = splice @_, 0, 2) { $self->{$key} = $value; } return $self; } sub build { my $self = shift; my @bits = $self->cgiloc; my $query = join('' => shift @bits, shift @bits, '?', 'q=', $self->query); if (@bits) { $query .= '&' . join('&', @bits); } my $res = $self->agent->request(HTTP::Request->new(GET => $query)) +; my $parsee = HTML::TokeParser->new(\$res->content); $self->parsee($parsee); return $res; } sub next_result { my $self = shift; my $result = {}; while (!%$result) { while (my $tag = $self->parsee->get_tag('p')) { my $a = $self->parsee->get_tag; unless ($a->[0] eq 'a') { $self->parsee->unget_token($a); next; } $result->{url} = $a->[1]->{href}; $result->{title} = $self->parsee->get_trimmed_text('/a'); return $result; } } continue { $self->place($self->place + 10); $self->cgiloc(($self->cgiloc)[0, 1], 'start=' . $self->place); $self->build; } } sub query { my $self = shift; if (@_) { $self->{query} = uri_escape(shift); } else { return $self->{query}; } } sub place { my $self = shift; if (@_) { $self->{place} = shift; } else { return $self->{place}; } } sub cgiloc { my $self = shift; if (@_) { $self->{cgiloc} = [@_]; } else { return @{$self->{cgiloc}}; } } sub parsee { my $self = shift; if (@_) { $self->{parsee} = shift; } else { return $self->{parsee}; } } sub agent { shift->{agent} } 1; __END__ =pod =head1 NAME WWW::Google - Temporary replacement for WWW::Search::Google =head1 SYNOPSIS use WWW::Google; my $search = WWW::Google->new; # build up query in $q $search->query($q); $search->build; while (my $res = $search->next_result) { print $res->{url}, ': ', $res->{title}; } $search->cgiloc('http://www.google.de', 'search'); # use german go +ogle $search->place(50); # start at page 50 =head1 DESCRIPTION This module uses the search engine Google to find websites related to +a particular term. The C<WWW::Search> modules are supposed to do this, b +ut it seems none of them work properly. So I decided to code up a hackish re +placement to use in the meantime. And here it is. And here are its methods: =over 4 =item new Returns a C<WWW::Google> object. Takes the name of the search robot as + the first argument, followed by an optional list of name-value pairs to se +t the object up. Possible values are cgiloc, place and query, all of which p +erform basically the same task as the method of the same names, with one exce +ption: query-strings are autoescaped in C<query> the method, whereas they're +passed in raw if you use the C<new> interface. =item build Gets a query page and sets it up for parsing. It takes no arguments, a +nd must be called before C<next_result> is. =item query Sets the query for the object to use when C<build> gets called. If cal +led without argument, returns the current query string. Queries are automa +tically URI-encoded. =item place The amount of results to start the search as. By default, it starts at + the first page of results, i.e. C<0>. Multiples of ten are probably best. =item cgiloc Specify a different location for C<build> to get the query result from +. Can be used to specify national variants of Google, presuming they use the sa +me HTML format as the google.com one. This is experimental. =item next_result Returns a hash containing two keys, C<url> and C<title>, which contain + the path to the search result and the title of the search result. This is what +you use to get the search results. If you use this in a loop, it will probably + turn infinite because of the sheer amount of search results. You'll have to + exit it early with a C<last> or something once you hit your desired amount of +results. =back =head1 NOTES THE DADDY OF WHEEL-REINVENTION! This is almost certainly very buggy - it was written in about an hour, + but it does the job. The code looks horrible and probably runs slower than it + should. People will probably be wanting the excerpt of text Google provides. W +ell, I found it was pretty hard to parse this - the problem being that some s +ites have categories and some don't, so how can you judge where the text ends? W +ell, you can, but I couldn't be bothered at the time. I will get around to it. =head1 AUTHOR Amoe. Thanks to crazyinsomniac and hacker. =head1 CONTACT Amoe on perlmonks.org. or email C<subvert underscore you at hotmail dot com>. The website will be at http://amoe.perlmonk.org/techno/perl/projects/www_google/ if I ever get it up. =head1 COPYRIGHT Free (substandard) software, daddy. This program is free software. You may copy or redistribute it under the same terms as Perl itself. =cut

In reply to WWW::Google by Amoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (6)
    As of 2014-10-22 02:19 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      For retirement, I am banking on:










      Results (112 votes), past polls