Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: Web Scraping on CGI Scripts?

by fraizerangus (Sexton)
on Oct 12, 2011 at 18:58 UTC ( #931067=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Web Scraping on CGI Scripts?
in thread Web Scraping on CGI Scripts?

Hello Again

WWW::Mechanize does seem to be the right medicine but I've already hit a snag on the road; I'm only interested in following the 'motion.cgi' links and extracting these links as text documents however the regex I've used only finds the first 2 links? Any ideas on whats going on?

#!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi? +/ ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; }
best wishes

Dan


Comment on Re^4: Web Scraping on CGI Scripts?
Download Code
Re^5: Web Scraping on CGI Scripts?
by tospo (Hermit) on Oct 13, 2011 at 08:56 UTC
    that's because after the first "follow_link" action, $mech_cgi is now on a different page (it behaves like a browser) and then you issue the next follow_link command but that links doesn't actually exist on the page you are on now. Add "$mech_cgi->back" before teh end of the loop and you will iterate through all the links.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://931067]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2015-07-06 21:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (83 votes), past polls