Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

LWP head mystery

by Anonymous Monk
on Oct 06, 2001 at 12:52 UTC ( [id://117189]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone explain why the following code:
#!/usr/bin/perl use LWP::Simple; unless (head('')) { push (@error,"$FORM{'url'} head error."); }
should push the error for the URL quoted and for but not for any other genuine URL I've tested it on? BTW, "get" works fine on them. According to LWPguru, Gisle Aas, the above construct can be used to test for the existence of a web page, and until today, that's always been true.

Replies are listed 'Best First'.
Re: LWP head mystery
by merlyn (Sage) on Oct 06, 2001 at 19:58 UTC
    That's why the link checkers I've written for my columns always try a GET after a failed HEAD. This is a long-known problem.

    -- Randal L. Schwartz, Perl hacker

      Thanks everyone for your helpful responses. (I have now shed my cloak of anonymity - i.e. worked out how to register).

      Ha! My very first post and it turned our to be off-topic!

      But since we've accidentally wandered off, can someone explain a bit further? I thought that every http access started with a HEAD in order to get the content-type - which is why I was so trusting of Gisle Aas's advice. Clearly I was wrong. But where exactly have I strayed from the path of righteousness?
        G'day Elliott, welcome to the monestary,

        HEAD is useful when you want information about a page, but don't actually want to see the page itself. It's most useful to check if a page exists, or when it was last modified. The biggest users of HEAD requests are proxies, which use HEAD requests to check whether or not they have a current copy of the page in their cache. If the last-modified date returned in the HEAD matches what the proxy has cached, then a whole page lookup is saved.

        To access the content-type and get the content at the same time, you'll want to look at using LWP::UserAgent. In particular, the responses you get back from request method are HTTP::Response methods. You can call ->headers or ->headers_as_string to get back the headers (including content-type) as either a HTTP::Headers or as a string respectively.

        A bit of sample code may help. This demonstrates how to pull back the frontpage of, and prints the content type and the content retrieved.

        use LWP::UserAgent; use HTTP::Request::Common; my $ua = LWP::UserAgent->new; my $response = $ua->request(GET ""); print "Content type is ", $response->headers->header("Content-Type"), "\n----\nContent is \n", $response->content;


Re: LWP head mystery
by miyagawa (Chaplain) on Oct 06, 2001 at 12:59 UTC
    seems like returns broken response with HEAD request. The following session ensures it.
    % telnet 80 Trying Connected to Escape character is '^]'. HEAD / HTTP/1.0 Host: Connection closed by foreign host.

    Tatsuhiko Miyagawa

Re: LWP head mystery
by blakem (Monsignor) on Oct 06, 2001 at 13:04 UTC
    Looks like they are serving up server errors for HEAD requests.....

    % HEAD 500 unexpected EOF before status line seen Client-Date: Sat, 06 Oct 2001 09:01:29 GMT % HEAD 500 unexpected EOF before status line seen Client-Date: Sat, 06 Oct 2001 09:02:59 GMT % HEAD 200 OK Connection: close Date: Sat, 06 Oct 2001 09:03:07 GMT Server: Apache/1.3.9 (Unix) Debian/GNU mod_perl/1.21_03-dev Content-Language: pl Content-Type: text/html; charset=iso-8859-1 Client-Date: Sat, 06 Oct 2001 09:03:12 GMT Client-Peer: Title: The Monastery Gates


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://117189]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-25 07:14 GMT
Find Nodes?
    Voting Booth?

    No recent polls found