Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Checking for an existing URL

by rob_au (Abbot)
on Sep 29, 2002 at 14:11 UTC ( #201542=note: print w/ replies, xml ) Need Help??


in reply to Checking for an existing URL

This a lot easier than you think ...

my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new('GET' => $url); my $response = $ua->request($request) if ($response->is_error) { ... } else { ... }

Alternatively, you could employ the is_success method for testing for successful retrieval of the passed URL. Furthermore, the actual numeric response code received can be returned with the code method. For further details on HTTP::Response methods, see the HTTP::Response man page.

 

perl -e 'print+unpack("N",pack("B32","00000000000000000000000111000011")),"\n"'


Comment on Re: Checking for an existing URL
Select or Download Code
Re: Re: Checking for an existing URL
by rjimlad (Acolyte) on Sep 29, 2002 at 14:57 UTC

    Or if you are just after 404s:

    ... if ($response->code()==404) { ...

    ...which is valid because while the payload of a 404, or in fact the text message in the status header could be anything (eg they could be localised) the first three non-space characters in the Status: HTTP header must be the HTTP response code.

    Unfortunately, there is a flaw in any such approach. If you want to probe for the existence of a file (or listening script), you may for example have DNS problems, a bad (or unusable) URI scheme part, a faulty proxy or redirector, problems connecting to the IP, random server problems, not to mention the possibility of a CGI script that sends a 404 response on purpose (necessary for most properly-operating error handler scripts).

    And the other side of it is that you can get 'false' positives from, eg, apache 'handlers', errordocuments and the like, including badly-operating error handlers.

    In short, there's no easy way to do so. Best option, IMO, would be to use $request->is_success(), as mentioned (implicitly) in the message preceding this, to mark 'valid' URLs and consider anything else to be undefined.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://201542]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2015-07-01 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (19 votes), past polls