http://www.perlmonks.org?node_id=564207


in reply to Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)

The CPAN module LWP::UserAgent provides a method "$ua->max_size( $bytes )" which allows you to gen just a little bit of content, just to make sure the link works.

If you set this before you issue a "$ua->get( $url )" request, you should be able to check for a "Client-Aborted" header in the response, as per the documentation.

Setting a maximum size allows you to make sure that you could get the document, without the program having to hang around waiting for the whole transfer. If you have a successful transfer, or a "Client-Aborted" header, you know that the link works and you can quickly move on to checking the next one.

Addendum: I just saw Albannach's suggestion for getting the header alone. Does anyone reading this know if there are cases (worth checking for) where you can get the header and can't get the content? If no one knows of any that are relevant to the OP, then just checking the header should be faster than getting a limited amount of content.

  • Comment on Re: Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)

Replies are listed 'Best First'.
Re^2: Check links.. if they do exist, then print link to file...but how do i check the existance of a link?(validity?)
by CountZero (Bishop) on Jul 27, 2006 at 20:39 UTC
    I remember downloading from a picture web-site a number of pictures with almost sequentially numbered files. Everytime you hit a "missing" number you got a "This picture does not exist page", so checking for some available content would not have worked. Perhaps only checking for a header could have warned me for the "missing" numbers.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law