Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
So, I'm trying to make a web crawler and I'm getting errors from WWW::Mechanize when they're perfectly valid links. I even tried getting the link without putting it in my recursive crawler and it worked. I'm guessing the website is preventing me from getting their pages after a certain search limit is reached like the Google Search API. Do some websites have a limit as to how much you can crawl through their website?
Re: Mechanize giving errors for no reason
by space_monk (Chaplain) on Jun 12, 2013 at 21:25 UTC
|
| [reply] |
Re: Mechanize giving errors for no reason
by pemungkah (Priest) on Jun 13, 2013 at 03:27 UTC
|
You might also be getting rate limited because your user agent string says you're WWW::Mechanize. Change it the agent argument in new(). But yes, do check if there's a rate limit, and make sure you're conforming to the site's requirements for access. | [reply] |
Re: Mechanize giving errors for no reason
by talexb (Chancellor) on Jun 13, 2013 at 19:39 UTC
|
Yes, some sites do watch to see that clients aren't hammering their site -- which is reasonable, when you think about it.
We'd be able to help you a great deal more if you were more specific about *which* errors you were seeing.
And .. do you think that Mechanize *randomly* spits out errors? There's almost always a very good reason for errors.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] |
|