Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Advice on Efficient Large-scale Web Crawling

by Scott7477 (Chaplain)
on May 07, 2006 at 06:54 UTC ( [id://547868]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Advice on Efficient Large-scale Web Crawling
in thread Advice on Efficient Large-scale Web Crawling

According to Google API a license key only allows for 1,000 automated queries per day. This page while somewhat dated, provides some data relevant to this discussion. A couple of key points from that data include:

-Netcraft estimated that 42.8 million web servers existed. Assuming 50 URLs per web server gives over 2.1 billion URLs. If the OP is randomly selecting URLs the chances of any particular server being significantly inconvenienced are small, in my estimation.
  • Comment on Re^3: Advice on Efficient Large-scale Web Crawling

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://547868]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (12)
As of 2024-04-23 14:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found