http://www.perlmonks.org?node_id=146299


in reply to Checking "incomplete" URLs

My question is how do I get LWP useragent to act like a browser and find the default page in a directory?

It has nothing to do with your browser, and everything to do with your web server. I tested your example on a site I had control of (running apache). Here's what happened:
[jon@valium jon]$ telnet divisionbyzero.com 80 Trying 168.103.109.84... Connected to divisionbyzero.com. Escape character is '^]'. GET /decss HTTP/1.0 HTTP/1.1 301 Moved Permanently Date: Tue, 19 Feb 2002 02:47:50 GMT Server: Apache/1.3.22 (Unix) (Red-Hat/Linux) mod_ssl/2.8.5 OpenSSL/0. +9.6b mod_perl/1.24_01 Location: http://www.divisionbyzero.com/decss/ Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>301 Moved Permanently</TITLE> </HEAD><BODY> <H1>Moved Permanently</H1> The document has moved <A HREF="http://www.divisionbyzero.com/decss/"> +here</A>.<P> <HR> <ADDRESS>Apache/1.3.22 Server at www.divisionbyzero.com Port 80</ADDRE +SS> </BODY></HTML> Connection closed by foreign host.
The web server sent me a 301 since /decss wasn't an actual file, but rather, a directory. My web browser followed that redirect automatically, which is what browsers are supposed to do when the http method used is GET or HEAD. I suspect your troubles are caused because you are using the POST method, which is explicitly forbidden to redirect you without notifying the user.

BlueLines

Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.

Replies are listed 'Best First'.
Re: Re: Checking "incomplete" URLs
by chipmunk (Parson) on Feb 19, 2002 at 04:29 UTC
    By default, LWP::UserAgent automatically follows redirects for any request except a POST. The redirect_ok() method controls this behavior:
    $ua->redirect_ok This method is called by request() before it tries to do any redirects. It should return a true value if a redirect is allowed to be performed. Subclasses might want to override this. The default implementation will return FALSE for POST request and TRUE for all others.
    Recently I had to write a script which posted a form on a remote site, and then checked the text of the resulting page to make sure the post succeeded. Unfortunately, there was a redirect to that page.

    First I tried a making a subclass with a new redirect_ok() that always returned 1. Unfortunately, LWP::UserAgent used a POST request for the redirect; the remote server returned a 405 error. I ended up writing a redirect_ok() which replaced the POST request object in @_ with a new one that did a GET instead. Ugly, but it worked!

      You could upgrade to latest libwww and just use method requests_redirectable from LWP::UserAgent
      $ua->requests_redirectable( ); # to read $ua->requests_redirectable( \@requests ); # to set This reads or sets the object's list of request names that "$ua->redirect_ok(...)" will allow redirection for. By default, this is "['GET', 'HEAD']", as per RFC 2068. To change to include 'POST', consider: push @{ $ua->requests_redirectable }, 'POST';

      --
      Ilya Martynov (http://martynov.org/)

Re: Re: Checking "incomplete" URLs
by nop (Hermit) on Feb 19, 2002 at 03:08 UTC
    Hurrah! GET (vs. POST) solved it -- Many thanks, BlueLines! ++
    sub validURL { my ($self, $url) = @_; my $req = new HTTP::Request GET => $url; my $res = $self->request($req); my $content = $res->content; return 0 if $content =~ /the page you have requested cannot be fou +nd/i; return 0 unless $content =~ /\S/i; return 1; }