Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
package WWW::Google; use strict; # Google.pm - amoe 20/01/2002 # hackish module to search google programmatically use LWP::UserAgent; use HTTP::Request; use HTML::TokeParser; use URI::Escape; # /me apologises in advance sub new { my $class = shift; my $self = bless {}, $class; my $agent_name = shift || "WWW-Google/0.1 ($^O; http://amoe.perlmonk.org/techno/perl/proj +ects/www_google/)"; my $agent = LWP::UserAgent->new; $agent->agent($agent_name); $self->{cgiloc} = ['http://www.google.com/', 'search']; $self->{place} = 0; $self->{agent} = $agent; while (my ($key, $value) = splice @_, 0, 2) { $self->{$key} = $value; } return $self; } sub build { my $self = shift; my @bits = $self->cgiloc; my $query = join('' => shift @bits, shift @bits, '?', 'q=', $self->query); if (@bits) { $query .= '&' . join('&', @bits); } my $res = $self->agent->request(HTTP::Request->new(GET => $query)) +; my $parsee = HTML::TokeParser->new(\$res->content); $self->parsee($parsee); return $res; } sub next_result { my $self = shift; my $result = {}; while (!%$result) { while (my $tag = $self->parsee->get_tag('p')) { my $a = $self->parsee->get_tag; unless ($a->[0] eq 'a') { $self->parsee->unget_token($a); next; } $result->{url} = $a->[1]->{href}; $result->{title} = $self->parsee->get_trimmed_text('/a'); return $result; } } continue { $self->place($self->place + 10); $self->cgiloc(($self->cgiloc)[0, 1], 'start=' . $self->place); $self->build; } } sub query { my $self = shift; if (@_) { $self->{query} = uri_escape(shift); } else { return $self->{query}; } } sub place { my $self = shift; if (@_) { $self->{place} = shift; } else { return $self->{place}; } } sub cgiloc { my $self = shift; if (@_) { $self->{cgiloc} = [@_]; } else { return @{$self->{cgiloc}}; } } sub parsee { my $self = shift; if (@_) { $self->{parsee} = shift; } else { return $self->{parsee}; } } sub agent { shift->{agent} } 1; __END__ =pod =head1 NAME WWW::Google - Temporary replacement for WWW::Search::Google =head1 SYNOPSIS use WWW::Google; my $search = WWW::Google->new; # build up query in $q $search->query($q); $search->build; while (my $res = $search->next_result) { print $res->{url}, ': ', $res->{title}; } $search->cgiloc('http://www.google.de', 'search'); # use german go +ogle $search->place(50); # start at page 50 =head1 DESCRIPTION This module uses the search engine Google to find websites related to +a particular term. The C<WWW::Search> modules are supposed to do this, b +ut it seems none of them work properly. So I decided to code up a hackish re +placement to use in the meantime. And here it is. And here are its methods: =over 4 =item new Returns a C<WWW::Google> object. Takes the name of the search robot as + the first argument, followed by an optional list of name-value pairs to se +t the object up. Possible values are cgiloc, place and query, all of which p +erform basically the same task as the method of the same names, with one exce +ption: query-strings are autoescaped in C<query> the method, whereas they're +passed in raw if you use the C<new> interface. =item build Gets a query page and sets it up for parsing. It takes no arguments, a +nd must be called before C<next_result> is. =item query Sets the query for the object to use when C<build> gets called. If cal +led without argument, returns the current query string. Queries are automa +tically URI-encoded. =item place The amount of results to start the search as. By default, it starts at + the first page of results, i.e. C<0>. Multiples of ten are probably best. =item cgiloc Specify a different location for C<build> to get the query result from +. Can be used to specify national variants of Google, presuming they use the sa +me HTML format as the google.com one. This is experimental. =item next_result Returns a hash containing two keys, C<url> and C<title>, which contain + the path to the search result and the title of the search result. This is what +you use to get the search results. If you use this in a loop, it will probably + turn infinite because of the sheer amount of search results. You'll have to + exit it early with a C<last> or something once you hit your desired amount of +results. =back =head1 NOTES THE DADDY OF WHEEL-REINVENTION! This is almost certainly very buggy - it was written in about an hour, + but it does the job. The code looks horrible and probably runs slower than it + should. People will probably be wanting the excerpt of text Google provides. W +ell, I found it was pretty hard to parse this - the problem being that some s +ites have categories and some don't, so how can you judge where the text ends? W +ell, you can, but I couldn't be bothered at the time. I will get around to it. =head1 AUTHOR Amoe. Thanks to crazyinsomniac and hacker. =head1 CONTACT Amoe on perlmonks.org. or email C<subvert underscore you at hotmail dot com>. The website will be at http://amoe.perlmonk.org/techno/perl/projects/www_google/ if I ever get it up. =head1 COPYRIGHT Free (substandard) software, daddy. This program is free software. You may copy or redistribute it under the same terms as Perl itself. =cut

In reply to WWW::Google by Amoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found