mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks
I am writing a tool that checks if links are working or not. The idea is to fetch the link, check the error code, and if the error code is valid do a few more tests on the HTML response page
I was wondering, is there a way to quickly fetch only the response code without the rest of the HTML data? This means that if the code is bad (e.g. 401) I can move on to the next link, but if it is OK I can fetch the rest of the data for further testing.
For example, if I run the code below
So, do you think this is even possible or just wishful thinking on my behalf?
Thanks, Mister Guy
Note: I am using LWP::UserAgent for the testing but can also use other modules if necessary
UPDATE: Used HEAD instead of GET for the HTTP request but the response time didn't improve
UPDATE 2: Used HEAD as compared to GET in 50 different links, and in some of them the HEAD request was indeed faster. Therefore the way to go is to use HEAD and of course parallel your processes if you want faster link checking. Thanks for the help
Hardware: The parts of a computer system that can be kicked.
I am writing a tool that checks if links are working or not. The idea is to fetch the link, check the error code, and if the error code is valid do a few more tests on the HTML response page
I was wondering, is there a way to quickly fetch only the response code without the rest of the HTML data? This means that if the code is bad (e.g. 401) I can move on to the next link, but if it is OK I can fetch the rest of the data for further testing.
For example, if I run the code below
It takes me about 1.5 seconds to get the response code (403). If I can somehow get it faster, it will make a big difference when I am testing 1000s of linksuse strict; use warnings; use LWP::UserAgent; use Time::HiRes; { my $ua = new LWP::UserAgent(); my $search_address = "http://ejournals.ebsco.com/direct.asp?Journa +lID=101503"; my $req = new HTTP::Request ('GET',$search_address); my $start = [ Time::HiRes::gettimeofday( ) ]; ##Get the response object my $res = $ua->request($req); ##Get the response time and return code my $diff = Time::HiRes::tv_interval( $start ); my $code = $res->code(); print "Code $code fetched in $diff seconds\n"; }
So, do you think this is even possible or just wishful thinking on my behalf?
Thanks, Mister Guy
Note: I am using LWP::UserAgent for the testing but can also use other modules if necessary
UPDATE: Used HEAD instead of GET for the HTTP request but the response time didn't improve
UPDATE 2: Used HEAD as compared to GET in 50 different links, and in some of them the HEAD request was indeed faster. Therefore the way to go is to use HEAD and of course parallel your processes if you want faster link checking. Thanks for the help
Hardware: The parts of a computer system that can be kicked.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Fast fetching of HTML response code
by BrowserUk (Patriarch) on May 30, 2012 at 08:42 UTC | |
by mrguy123 (Hermit) on May 30, 2012 at 08:59 UTC | |
by BrowserUk (Patriarch) on May 30, 2012 at 10:16 UTC | |
Re: Fast fetching of HTML response code
by Corion (Patriarch) on May 30, 2012 at 08:47 UTC | |
by mrguy123 (Hermit) on May 30, 2012 at 09:06 UTC | |
by Corion (Patriarch) on May 30, 2012 at 09:09 UTC | |
Re: Fast fetching of HTML response code
by Anonymous Monk on May 30, 2012 at 08:41 UTC |
Back to
Seekers of Perl Wisdom