Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: What is the fastest way to download a bunch of web pages?

by lestrrat (Deacon)
on Mar 03, 2005 at 16:08 UTC ( #436260=note: print w/ replies, xml ) Need Help??


in reply to What is the fastest way to download a bunch of web pages?

Just to make things more interesting, I'd suggest you take a look at even based approach, for example, via POE (POE::Component::Client::HTTP) or the like.

But I'd suggest that you keep this in the back of your head, and leave it for future, because it requires that you think about I/O, order of things, blah blah blah.

It was pretty hard for me personally to write a web crawler like that.

But anyway, it *is* possible to increase the performance of fetching websites to about 10K ~ 20K urls/hour using such an approach. And this is with a single process.


Comment on Re: What is the fastest way to download a bunch of web pages?
Re^2: What is the fastest way to download a bunch of web pages?
by tphyahoo (Vicar) on Mar 03, 2005 at 16:27 UTC
    Sounds promising. Any open code to do this?

      If you're looking to control how many child processes, Parallel::ForkManager may be helpful. The example source specifically demonstrates what I think you're trying to accomplish.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://436260]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2014-12-26 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls