Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: Advice on Efficient Large-scale Web Crawling

by Scott7477 (Chaplain)
on May 07, 2006 at 06:54 UTC ( [id://547868]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Advice on Efficient Large-scale Web Crawling
in thread Advice on Efficient Large-scale Web Crawling

According to Google API a license key only allows for 1,000 automated queries per day. This page while somewhat dated, provides some data relevant to this discussion. A couple of key points from that data include:

-Netcraft estimated that 42.8 million web servers existed. Assuming 50 URLs per web server gives over 2.1 billion URLs. If the OP is randomly selecting URLs the chances of any particular server being significantly inconvenienced are small, in my estimation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://547868]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-03-29 13:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found