Mechanize giving errors for no reason

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So, I'm trying to make a web crawler and I'm getting errors from WWW::Mechanize when they're perfectly valid links. I even tried getting the link without putting it in my recursive crawler and it worked. I'm guessing the website is preventing me from getting their pages after a certain search limit is reached like the Google Search API. Do some websites have a limit as to how much you can crawl through their website?

Comment on Mechanize giving errors for no reason

Replies are listed 'Best First'.
Re: Mechanize giving errors for no reason by space_monk (Chaplain) on Jun 12, 2013 at 21:25 UTC
Connection could also be rate limited in some way. You'd perhaps do better putting some code up other than have us make random guesses... :-) If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply]
Re: Mechanize giving errors for no reason by pemungkah (Priest) on Jun 13, 2013 at 03:27 UTC
You might also be getting rate limited because your user agent string says you're `WWW::Mechanize`. Change it the `agent` argument in `new()`. But yes, do check if there's a rate limit, and make sure you're conforming to the site's requirements for access.	[reply]
Re: Mechanize giving errors for no reason by talexb (Chancellor) on Jun 13, 2013 at 19:39 UTC
Yes, some sites do watch to see that clients aren't hammering their site -- which is reasonable, when you think about it. We'd be able to help you a great deal more if you were more specific about which errors you were seeing. And .. do you think that Mechanize randomly spits out errors? There's almost always a very good reason for errors. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply]


P is for Practical
	PerlMonks