Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: WWW::Mechanize follow meta refreshes

by Kanji (Parson)
on Apr 14, 2005 at 15:19 UTC ( [id://447836]=note: print w/replies, xml ) Need Help??


in reply to WWW::Mechanize follow meta refreshes

I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

This is actually built into WWW::Mechanize (well, LWP...) for you, so you can do something like:-

if ($mech->response and my $refresh = $mech->response->header('Refresh +')) { my($delay, $uri) = split /;url=/i, $refresh; $uri ||= $mech->uri; # No URL; reload current URL. sleep $delay; $mech->get($uri); }

$delay should probably be validated to protect against malformed META refresh tags, and there's a whole other headache about potential loops if you hack WWW::Mechanize to follow refreshes automatically.

    --k.


Replies are listed 'Best First'.
Re^2: WWW::Mechanize follow meta refreshes
by simon.proctor (Vicar) on Apr 15, 2005 at 09:00 UTC
    The snippet I provided is from my test suite. I'll be first to admit that it isn't great as I've only just started hacking away with Mechanize (and wondered why I didn't start sooner ;P).

    Anyway, from a testing perspective is it not better to follow the expected url and not the url in the template? Its only a minor point but are you not then reporting on a mistaken redirect but continuing as normal otherwise? I feel this is better but would welcome your comments.

    I do like the delay bit but, for my testing purposes, I would also pass that into the function. Something like:
    meta_refresh($mech, '/index.cgi?rm=home', 5);
    Or whatever :). I would also then, personally, have a default delay (of some time determined by the particular project) and simply validate the delay as being correct (for the same reasons as with the URL).

    Its funny, I only wrote this function because IIS, at the time, couldn't handle HTTP redirects and would crash (no really). Its *fixed now* but I don't have the time to rework my app again :).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://447836]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-19 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found