Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Web Scraping on CGI Scripts?

by tospo (Hermit)
on Oct 11, 2011 at 08:32 UTC ( #930763=note: print w/replies, xml ) Need Help??


in reply to Re^2: Web Scraping on CGI Scripts?
in thread Web Scraping on CGI Scripts?

oh and I forgot to mention: you are always parsing the HTML output that the server sends to you. It doesn't matter that this is a cgi script generating the page on the server, the output is just HTML (unless it's a webservice that sends XML, JSON or the like). So there is nothing special about this case.

Replies are listed 'Best First'.
Re^4: Web Scraping on CGI Scripts?
by fraizerangus (Sexton) on Oct 12, 2011 at 18:58 UTC
    Hello Again

    WWW::Mechanize does seem to be the right medicine but I've already hit a snag on the road; I'm only interested in following the 'motion.cgi' links and extracting these links as text documents however the regex I've used only finds the first 2 links? Any ideas on whats going on?

    #!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi? +/ ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; }
    best wishes

    Dan

      that's because after the first "follow_link" action, $mech_cgi is now on a different page (it behaves like a browser) and then you issue the next follow_link command but that links doesn't actually exist on the page you are on now. Add "$mech_cgi->back" before teh end of the loop and you will iterate through all the links.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://930763]
help
Chatterbox?
[ambrus]: Today I accidentally cut my hand while carrying a computer. On the display part of the motherboard that sticks out at the back of the chasis and has ports, there's this thin metal sheet with holes cut for the ports, to guide plugs into the sockets.
[ambrus]: This sheet has sharp needle-like parts, 0.004 long and less than 0.001 wide, that can get bent to point outwards, and one of these cut into my palm when I lifted the box.
[ambrus]: So now when I choose what motherboard to buy for my new home computer, I have one more specific property to guide me. Useful, because there's so many different boards to choose from.

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2017-01-16 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (151 votes). Check out past polls.