Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Checking for an existing URL

by rob_au (Abbot)
on Sep 29, 2002 at 14:11 UTC ( #201542=note: print w/ replies, xml ) Need Help??


in reply to Checking for an existing URL

This a lot easier than you think ...

my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new('GET' => $url); my $response = $ua->request($request) if ($response->is_error) { ... } else { ... }

Alternatively, you could employ the is_success method for testing for successful retrieval of the passed URL. Furthermore, the actual numeric response code received can be returned with the code method. For further details on HTTP::Response methods, see the HTTP::Response man page.

 

perl -e 'print+unpack("N",pack("B32","00000000000000000000000111000011")),"\n"'


Comment on Re: Checking for an existing URL
Select or Download Code
Re: Re: Checking for an existing URL
by rjimlad (Acolyte) on Sep 29, 2002 at 14:57 UTC

    Or if you are just after 404s:

    ... if ($response->code()==404) { ...

    ...which is valid because while the payload of a 404, or in fact the text message in the status header could be anything (eg they could be localised) the first three non-space characters in the Status: HTTP header must be the HTTP response code.

    Unfortunately, there is a flaw in any such approach. If you want to probe for the existence of a file (or listening script), you may for example have DNS problems, a bad (or unusable) URI scheme part, a faulty proxy or redirector, problems connecting to the IP, random server problems, not to mention the possibility of a CGI script that sends a 404 response on purpose (necessary for most properly-operating error handler scripts).

    And the other side of it is that you can get 'false' positives from, eg, apache 'handlers', errordocuments and the like, including badly-operating error handlers.

    In short, there's no easy way to do so. Best option, IMO, would be to use $request->is_success(), as mentioned (implicitly) in the message preceding this, to mark 'valid' URLs and consider anything else to be undefined.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://201542]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2014-09-16 06:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (157 votes), past polls