comment on

Hi everybody,

I am trying to scrape some Google Scholar results. I have a problem in going from the first page of result to the second, and so on. In particular, I have tried to tell follow_link to click on the link with text 'Next' but it does not seem to recognize it. I have tried to use text_regex, but no success either. I believe I am not spotting the right "text", but I really need to rely on that to look for the link because the url is very complicated. Any clue?? Many many thanks!

Here is the code:

my $title = Raumchemie der festen Stoffe
$mech->get(
"http://scholar.google.it/scholar?q=" . $title );

$mech->follow_link( url_regex => qr/cites/i, n => 1 );
my $result  = $mech->content;
my $indi    = $mech->uri();
my $rest     = $out->scrape( $result, $indi );
#~     dd( $result, $rest );

dd( $rest );
print F3 $rest;

for my $i (2..200) {
my $ii = $i . "0";
print "page : ".$i."\n";
$mech->follow_link( text_regex => qr/Next$/)or die("finished on page :
+ ".$i."\n");
my $result  = $mech->content;
my $indi    = $mech->uri();
my $rest     = $out->scrape( $result, $indi );
#~     dd( $result, $rest );
dd( $rest );
print F3 $rest;
sleep(5);

}
[download]

In reply to WWW::Mechanize follow_link not working by sbasbasba

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks