http://www.perlmonks.org?node_id=117189

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone explain why the following code:
#!/usr/bin/perl use LWP::Simple; unless (head('http://www.123box.co.uk')) { push (@error,"$FORM{'url'} head error."); }
should push the error for the URL quoted and for http://www.quickwebmail.com but not for any other genuine URL I've tested it on? BTW, "get" works fine on them. According to LWPguru, Gisle Aas, the above construct can be used to test for the existence of a web page, and until today, that's always been true.

Replies are listed 'Best First'.
Re: LWP head mystery
by merlyn (Sage) on Oct 06, 2001 at 19:58 UTC
    That's why the link checkers I've written for my columns always try a GET after a failed HEAD. This is a long-known problem.

    -- Randal L. Schwartz, Perl hacker

      Thanks everyone for your helpful responses. (I have now shed my cloak of anonymity - i.e. worked out how to register).

      Ha! My very first post and it turned our to be off-topic!

      But since we've accidentally wandered off, can someone explain a bit further? I thought that every http access started with a HEAD in order to get the content-type - which is why I was so trusting of Gisle Aas's advice. Clearly I was wrong. But where exactly have I strayed from the path of righteousness?
        G'day Elliott, welcome to the monestary,

        HEAD is useful when you want information about a page, but don't actually want to see the page itself. It's most useful to check if a page exists, or when it was last modified. The biggest users of HEAD requests are proxies, which use HEAD requests to check whether or not they have a current copy of the page in their cache. If the last-modified date returned in the HEAD matches what the proxy has cached, then a whole page lookup is saved.

        To access the content-type and get the content at the same time, you'll want to look at using LWP::UserAgent. In particular, the responses you get back from request method are HTTP::Response methods. You can call ->headers or ->headers_as_string to get back the headers (including content-type) as either a HTTP::Headers or as a string respectively.

        A bit of sample code may help. This demonstrates how to pull back the frontpage of perlmonks.org, and prints the content type and the content retrieved.

        use LWP::UserAgent; use HTTP::Request::Common; my $ua = LWP::UserAgent->new; my $response = $ua->request(GET "http://perlmonks.org/"); print "Content type is ", $response->headers->header("Content-Type"), "\n----\nContent is \n", $response->content;
        Cheers,

        Paul

Re: LWP head mystery
by miyagawa (Chaplain) on Oct 06, 2001 at 12:59 UTC
    seems like 123box.co.uk returns broken response with HEAD request. The following session ensures it.
    % telnet www.123box.co.uk 80 Trying 212.67.197.196... Connected to ns.123box.co.uk. Escape character is '^]'. HEAD / HTTP/1.0 Host: www.123box.co.uk Connection closed by foreign host.

    --
    Tatsuhiko Miyagawa
    miyagawa@cpan.org

Re: LWP head mystery
by blakem (Monsignor) on Oct 06, 2001 at 13:04 UTC
    Looks like they are serving up server errors for HEAD requests.....

    % HEAD http://www.123box.co.uk 500 unexpected EOF before status line seen Client-Date: Sat, 06 Oct 2001 09:01:29 GMT % HEAD http://www.quickwebmail.com 500 unexpected EOF before status line seen Client-Date: Sat, 06 Oct 2001 09:02:59 GMT % HEAD http://www.perlmonks.com 200 OK Connection: close Date: Sat, 06 Oct 2001 09:03:07 GMT Server: Apache/1.3.9 (Unix) Debian/GNU mod_perl/1.21_03-dev Content-Language: pl Content-Type: text/html; charset=iso-8859-1 Client-Date: Sat, 06 Oct 2001 09:03:12 GMT Client-Peer: 206.170.14.76:80 Title: The Monastery Gates

    -Blake