Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

WWW::Mechanize follow meta refreshes

by simon.proctor (Vicar)
on Apr 13, 2005 at 09:40 UTC ( #447314=snippet: print w/ replies, xml ) Need Help??

Description: My web app (for IIS reasons) uses meta refreshes to redirect the user around. I test these redirects with the code below. I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

Update: fixed silly mistake as hilighted below. Also fixed what was, for my test suite, a logic error. The final get call must be to $expected_url and not $url. If your test quite works differently then use $url instead :).
sub meta_refresh
{
    my $mech = shift;
    my $expected_url = shift;
    
    my $url;
    
    if($mech->content() =~ /<meta http-equiv="refresh" content="0;url=
+([^"]*)"/)
    {
        $url = $1;
    }

    cmp_ok($expected_url, 'eq', $url, "The meta refresh returns the ex
+pected URL");
    
    $mech->get( $expected_url );
    ok($mech->success(), "URL loaded successfully");
}
So call it like this:
   # Code to cause the refresh to appear not shown.
   
   # Check the refresh and follow
   meta_refresh($mech, '/index.cgi?rm=home');
Comment on WWW::Mechanize follow meta refreshes
Select or Download Code
Re: WWW::Mechanize follow meta refreshes
by merlyn (Sage) on Apr 13, 2005 at 14:38 UTC
    $mech->content() =~ /<meta http-equiv="refresh" content="0;url=([^"]*) +"/; my $url = $1;
    Never never never use $1 without having tested the match. If the match fails, you're using a previous $1 from a previous successful match. Oops!

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Oops!! :).

      I have updated the node.
Re: WWW::Mechanize follow meta refreshes
by jbrugger (Parson) on Apr 14, 2005 at 06:10 UTC
    As it does with JavaScript as well, see www::mechanize reloading page, so you'd remove a line (or a part of it) to stop JavsScript from loading another page.
    "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
Re: WWW::Mechanize follow meta refreshes
by Kanji (Parson) on Apr 14, 2005 at 15:19 UTC
    I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

    This is actually built into WWW::Mechanize (well, LWP...) for you, so you can do something like:-

    if ($mech->response and my $refresh = $mech->response->header('Refresh +')) { my($delay, $uri) = split /;url=/i, $refresh; $uri ||= $mech->uri; # No URL; reload current URL. sleep $delay; $mech->get($uri); }

    $delay should probably be validated to protect against malformed META refresh tags, and there's a whole other headache about potential loops if you hack WWW::Mechanize to follow refreshes automatically.

        --k.


      The snippet I provided is from my test suite. I'll be first to admit that it isn't great as I've only just started hacking away with Mechanize (and wondered why I didn't start sooner ;P).

      Anyway, from a testing perspective is it not better to follow the expected url and not the url in the template? Its only a minor point but are you not then reporting on a mistaken redirect but continuing as normal otherwise? I feel this is better but would welcome your comments.

      I do like the delay bit but, for my testing purposes, I would also pass that into the function. Something like:
      meta_refresh($mech, '/index.cgi?rm=home', 5);
      Or whatever :). I would also then, personally, have a default delay (of some time determined by the particular project) and simply validate the delay as being correct (for the same reasons as with the URL).

      Its funny, I only wrote this function because IIS, at the time, couldn't handle HTTP redirects and would crash (no really). Its *fixed now* but I don't have the time to rework my app again :).
Re: WWW::Mechanize follow meta refreshes
by mhi (Friar) on Aug 28, 2005 at 20:46 UTC
    Thanks for posting this. It was just right to get me on my way to becoming a first-time user of WWW::Mechanize.

    I've put together a little script that will do some pre-fetching on a web-application. The app has a result-cache and will output a refresh whenever it encounters a cache-miss and does its calculations.

    Instead of parsing the refresh-URLs or somesuch, I simply limited the number of refreshs the script will perform. Perhaps somemonk will find a useful snippet of code herein.

    #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $maxrefreshs=5; my $debug=0; my @urls=( "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=surnames", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=unisearch&MATCHSTRING +=foo", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=path&STARTNODE=I1&END +NODE=I1257", ); my $refreshs=0; my $mech= new WWW::Mechanize; foreach my $url (@urls){ while($refreshs < $maxrefreshs){ $mech->get($url); my $c=$mech->content; $debug and print $c; if($c =~/<meta\s+http-equiv="refresh"\s+content="\d+;\s*url=([^" +]*)"/mi){ $url=($1 or $url); ++$refreshs; }else{ last; } } }

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://447314]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (13)
As of 2015-07-06 19:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (81 votes), past polls