Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: threads: work crew memory leak

by BrowserUk (Pope)
on Oct 17, 2010 at 07:12 UTC ( #865775=note: print w/ replies, xml ) Need Help??


in reply to Re^2: threads: work crew memory leak
in thread threads: work crew memory leak

Conceivably, EVEN a single addition of 100,000 variables on a 100,000 processor system could have a 100,000X speedup if thread creation overhead was 0.

Sorry, but that's just very naive.

  1. Firstly, even in C or assembler, the creation overhead can never be 0.

    The creation of a (kernel) thread requires, at minimum:

    • the allocation of a stack segment.
    • the allocation of a register set save area.
    • the allocation of a thread 'context' structure to hold stuff like priorities, permission etc.
    • linking that context into the scheduler dispatch queues and other control structures.

    And each of those requires a transition from ring3 user space to ring 0 kernel space, which costs about 800 clock cycles on its own.

  2. Each time a thread runs, it requires that:
    • the current register contents be saved to that threads context structure;
    • the new threads saved register set be loaded into the registers;
    • the processor pipelines need to be flushed.
    • the scheduler queues and tables need to be updated.
    • it's almost inevitable that some L1/l2/l3 cache lines will need to be flushed to ram and reloaded.

    All of those will mean hundreds if not thousands of cycles overhead.

A single addition of 2 numbers happens in 1 clock cycle. Spawning a new thread for each addition of 100,000 numbers even if you had 100,000 cores on your processor--which isn't going to happen any time in the next 10 years, if at all--will take far longer than just looping over the whole 100,000 on a single thread. And that's in C, never mind an interpreted language like Perl.

However, if we get back into the real world and consider processing those 100,000 additions on a 4 or 16 or 64 core system, then starting 1 thread per core and having each operate on 100,000/cores values, it's quite likely that you can achieve real (and realistic) economies.

Of course, the best approach today and in the immediate future, would be to load the data onto a Tesla GPU, and let it's 448 cores loose on it in parallel.

threads are alot less useful on today's platforms due to their high overhead.

I'd love to know what platforms you're talking about where the cost of threading was cheaper?

The only ones I'm aware of that might qualify are things like Java 1.1's 'green' threads. Whilst they were cheap to spawn--as they run completely in user space, so avoiding ring-level transitions--they were practically useless in as much as they only emulate true multi-tasking, by embedding a scheduler within the VM. As such they don't scale across cores, so when any one green thread is running, all the others are stopped. So there is no performance gain for cpu-bound processing.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^3: threads: work crew memory leak
Re^4: threads: work crew memory leak
by rakzer (Novice) on Oct 17, 2010 at 09:59 UTC
    FWIW, the reason why I originally chose to create that many threads is that threads could (and quite frequently will) time out due to network related issues. Creating a timer thread to kill timed out threads seemed very easy and handy. Creating a fork within a thread with a signal handler killing the fork on the other hand would be a lot more expensive.
      Creating a timer thread to kill timed out threads seemed very easy and handy.

      That sounds simple and convenient, but threads have state. And if you go around just killing them, you don't give them the opportunity to clean up after themselves and that's where 'mysterious' memory leaks arise.

      Creating a fork within a thread with a signal handler killing the fork on the other hand would be a lot more expensive.

      I certainly agree with you that that is not a viable option.

      But there are better alternatives to both those approaches. The details depend upon the type of "network related issues" you are concerned about?

      In general, using cancellable asynchronous IO or non-blocking IO, within the thread, to allow you to abandon slow or broken connections, is a viable alternative. It allows you to retain the benefits of threading: simplicity of 'serial flow' architecture; scalability across cores; whilst avoiding the traps of 'throwing threads at the problem' & the difficulties of clean-up associated with abandoning threads.

      You gain the benefits of a non-blocking IO, without the need to invert the flow of control, or have the need for abandonable IO to dictate the architecture of your entire application.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        That sounds simple and convenient, but threads have state. And if you go around just killing them, you don't give them the opportunity to clean up after themselves and that's where 'mysterious' memory leaks arise.
        What would be the right way(tm) to kill a hanging thread as far as proper garbage collection goes?
        But there are better alternatives to both those approaches. The details depend upon the type of "network related issues" you are concerned about?
        Basically all the program does is retrieving the RFC 2616 status code of a large number (+500k) of distfiles from various different http/ftp servers using HTTP::Request and LWP::UserAgent.

        Setting $ua->timeout(10); deosn't seem to be working at all. I'm now doing:

        my $code; eval { local $SIG{ALRM} = sub {die "alarm\n"}; alarm $defcfg->{'link_check_timeout'}; # LWP stuff $code= ...; alarm 0; }
        Not sure if that's the best thing but it seems to work better than setting an LWP timeout.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://865775]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2014-12-17 23:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (40 votes), past polls