Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: WWW::Mechanize follow meta refreshes

by mhi (Friar)
on Aug 28, 2005 at 20:46 UTC ( #487286=note: print w/ replies, xml ) Need Help??


in reply to WWW::Mechanize follow meta refreshes

Thanks for posting this. It was just right to get me on my way to becoming a first-time user of WWW::Mechanize.

I've put together a little script that will do some pre-fetching on a web-application. The app has a result-cache and will output a refresh whenever it encounters a cache-miss and does its calculations.

Instead of parsing the refresh-URLs or somesuch, I simply limited the number of refreshs the script will perform. Perhaps somemonk will find a useful snippet of code herein.

#!/usr/bin/perl -w use strict; use WWW::Mechanize; my $maxrefreshs=5; my $debug=0; my @urls=( "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=surnames", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=unisearch&MATCHSTRING +=foo", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=path&STARTNODE=I1&END +NODE=I1257", ); my $refreshs=0; my $mech= new WWW::Mechanize; foreach my $url (@urls){ while($refreshs < $maxrefreshs){ $mech->get($url); my $c=$mech->content; $debug and print $c; if($c =~/<meta\s+http-equiv="refresh"\s+content="\d+;\s*url=([^" +]*)"/mi){ $url=($1 or $url); ++$refreshs; }else{ last; } } }


Comment on Re: WWW::Mechanize follow meta refreshes
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://487286]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2015-07-02 01:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (25 votes), past polls