Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Parallel downloading under Win32?

by Xenofur (Monk)
on Apr 29, 2009 at 20:05 UTC ( [id://760967]=note: print w/replies, xml ) Need Help??


in reply to Re: Parallel downloading under Win32?
in thread Parallel downloading under Win32?

I would like to test it, but for that I'd need to insert it into my module. This endeavour in turn is hampered by the fact that i just plain cannot tell what's going on after: my $Q = new Thread::Queue;

Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.

Replies are listed 'Best First'.
Re^3: Parallel downloading under Win32?
by Corion (Patriarch) on Apr 29, 2009 at 20:10 UTC

    It's not actually hard. The system has threads that are fed off a Thread::Queue. Each thread takes a job from the queue, performs it, then takes the next one from the queue. The map just creates $T threads, and to tell each thread that it is finished, it sticks $T undef elements at the end of the queue. Then the main thread waits that all threads finish their work. That's all there is to it.

      Thanks for the explanations, they helped me a lot in understanding it. :)
Re^3: Parallel downloading under Win32?
by BrowserUk (Patriarch) on Apr 29, 2009 at 21:07 UTC
    Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.

    Is that a request for clarification?

    Suggestion: Run it standalone as posted first, to convince yourself that it actually works on your system.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Oh, i had no doubt that it worked. I had trouble understanding *how* it worked. I write my perl in a very declarative and verbose manner, have never had reason to use map before, didn't know you could string commands together with commas to act on $_ without wrapping it in braces and didn't know why you were pushing undefs into the array.

      In short: The syntax and lack of any explanation completely stumped me.

      Either way, i have to admit that it is a superior solution to the wget method, as long as enough ram is available. Getting it to run enough threads to run at comparable speed to the wget method required 300 mb. However, due to the fact that it actually is possible to keep control of the ram use and that it runs entirely with Perl modules it is the better solution.

      As such, thanks a lot. :)

      FWIF, this is how i'm using it now:
        Getting it to run enough threads to run at comparable speed to the wget method required 300 mb.

        How many wget instances were you running?

        I'd be really surprised if it is necessary to run 20 threads in order to saturate your bandwidth. Unless the server you are connecting to is severally restricting the throughput of individual connections. And when that happens--for example if the site is using thttpd or similar--unless the webmaster is very naive, they ensure that the throttling rates apply across all concurrent connections from any given ip.

        Running 2 or 3 connections concurrently usually serves to maximise throughput. Beyond that, thread thrash tends to deteriorate throughput rather than increase it. Threads newbies tend to think: 'more is better', but the reality is, That is rarely the case.

        Especially with tcp connections. TCP has been tuned over decades to utilise as much bandwidth as is available for each connection. Whilst using two concurrent connnections will usually allow the second to 'mop up' any bandwidth under-utilised by the first, unless you have more than one processor/core, a third thread will usually impact the performance of the first two through thread thrash. (Assuming unrestrictied and infinite bandwidth from the server.)

        As a rule of thumb, I would suggest that you set $T (or $thread_count as you would have it :), to no more than 2 * NoOfCores (sorry $no_of_cores :).


        Caveat: From the code you posted, you are pushing your entire url list onto the queue, prior to staring your threads. If your url list is relatively small--say < ne3--no harm done. But...if your url list is bigger than that, the I would highly recommend starting your threads first and including a call to yield() in your url enqueue loop.

        Caveat 2: If your are seriously seeking to minimise memory usage, then you should consider starting your threads pool prior to loading (useing) the vast majority of whatever code or modules are needed by the main body of your application.

        The reason for this advice, is that for good or bad, the original author(s) of threads decided that each spawned thread would inherit everything already loaded by the main thread at the point of thread creation(*). (eg. he/they decided to emulate the fork way of working!) By starting your worker threads early--remember that use is a compile-time enacted opcode--you can minimise the size of the primary thread and therefore, the size of every subsequently spawed thread.

        (*) Yes. I know it is dumb, but you try convincing those that have the power to change things of that!


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://760967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2026-01-15 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (118 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.