Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: download JPG series with error-handling

by moot (Chaplain)
on Mar 28, 2005 at 02:18 UTC ( [id://442731]=note: print w/replies, xml ) Need Help??


in reply to download JPG series with error-handling

If you really don't want to make the leap to LWP::UserAgent, what's wrong with LWP::Simple's is_success or is_error methods? Or you could check the response code returned by getstore.

Update: I wouldn't put money on the server sending back a 404 for a missing image, although any half-decent webmaster will do this (in addition to possibly supplying an error document), but I think you could rely on it for most applications. If you find this is not the case, I would think testing the mime type of the returned document would be more efficient than a HEAD request potentially followed by a GET.

Replies are listed 'Best First'.
Re^2: download JPG series with error-handling
by BrentDax (Hermit) on Mar 28, 2005 at 04:26 UTC

    410 (Gone) is another valid "this doesn't exist" response, and 403 might also be important. You might as well just detect any error return--the design might change in the future, after all. (HTTP errors start with a 4 or 5.)

    =cut
    --Brent Dax
    There is no sig.

      To be really safe here, you might check for a success code (I think it's 202, but don't " me) rather than trying to think of all the possible error codes you might get.

      Also, just to throw another wrench in, just checking MIME type might not be good enough. What if the error "page" you get back is itself actually a jpeg image? I've seen that before. Don't know a way around it unless you are also checking the error code (and the webmaster configured the server sensibly to send back error codes).

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

        All 200s are successes. 200 ("OK") is the usual case. is_success (on the return value of getstore and getprint) checks for a return code in the 200s. But as moot mentioned, some servers return success (200) with an HTML message instead of a 404 (with or without an HTML message) when an file cannot be found. That's why I mentioned checking Content-Type in addition to the return code.
Re^2: download JPG series with error-handling
by Anonymous Monk on Mar 28, 2005 at 08:52 UTC
    I would definitely check is_error first, but as mentioned before many times a missing file won't properly return a 404 code. Normally this is because the webmaster put a full URL in the ErrorDocument directive, which causes Apache to send a 302 response. What I've done before iss checked the file size. I could be fairly confideent that the images would be 80K or more, while an error page isn't likey to be more than 10K-20K. So anything under 50K is assumed an error, anything over 50K is assumed to be the image.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://442731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-06-17 10:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.