Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)

by rodion (Chaplain)
on Jul 27, 2006 at 19:29 UTC ( #564207=note: print w/ replies, xml ) Need Help??


in reply to Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)

The CPAN module LWP::UserAgent provides a method "$ua->max_size( $bytes )" which allows you to gen just a little bit of content, just to make sure the link works.

If you set this before you issue a "$ua->get( $url )" request, you should be able to check for a "Client-Aborted" header in the response, as per the documentation.

Setting a maximum size allows you to make sure that you could get the document, without the program having to hang around waiting for the whole transfer. If you have a successful transfer, or a "Client-Aborted" header, you know that the link works and you can quickly move on to checking the next one.

Addendum: I just saw Albannach's suggestion for getting the header alone. Does anyone reading this know if there are cases (worth checking for) where you can get the header and can't get the content? If no one knows of any that are relevant to the OP, then just checking the header should be faster than getting a limited amount of content.


Comment on Re: Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)
Re^2: Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)
by CountZero (Bishop) on Jul 27, 2006 at 20:39 UTC
    I remember downloading from a picture web-site a number of pictures with almost sequentially numbered files. Everytime you hit a "missing" number you got a "This picture does not exist page", so checking for some available content would not have worked. Perhaps only checking for a header could have warned me for the "missing" numbers.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://564207]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2014-11-29 03:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (203 votes), past polls