Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Is Using Threads Slower Than Not Using Threads?

by BrowserUk (Pope)
on Nov 01, 2010 at 09:16 UTC ( #868710=note: print w/ replies, xml ) Need Help??


in reply to Is Using Threads Slower Than Not Using Threads?

Effectively, you've done the equivalent of loading your car onto the back of a transporter and driving the transporter to all your appointments. Needless to say, using two vehicles in this way does not speed up your deliveries, and costs the time it takes getting the car on and off the transporter.

What your code does is simply move the code that would be in the main (initial;startup) thread, into a new thread.

As such, you're only making use of one thread--the main thread just sits blocked, doing nothing until the new thread completes--so it won't speed anything up.

But, you've added the costs of

  1. starting a new thread--a few milliseconds at most.
  2. Unless you've shared @allips, then that array will be copied into the new thread.

    That's pure overhead.

  3. copying the results set from the new thread back to the main thread.

    More overhead.

So yes. The way you are going about it, using a thread that way will be slower than not using them.

As other have said, the first way to speed up your program will be to use a better algorithm.

However, if your machine has multiple cores--once you've avoided searching every line 3500 times, and made sure that you're using the fastest search mechanism Perl has to offer--then there might be some further gains to be had by using threading effectively.

Of course, once you've made the algorithm changes, you might be running quickly enough and not need to use threading. But if you'd like to investigate usng threads properly, speak up.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Is Using Threads Slower Than Not Using Threads?
Download Code
Re^2: Is Using Threads Slower Than Not Using Threads? (copying)
by tye (Cardinal) on Nov 01, 2010 at 17:40 UTC

    Unless you've shared @allips, then that array will be copied into the new thread.

    That's pure overhead.

    That isn't very accurate.

    As can be easily seen, the overhead eliminated by using threads::shared isn't actually the lion's share of overhead in copying a large array to a new thread:

    $ alias tperl="time perl -w -Mthreads -Mthreads::shared" # No data to copy: $ tperl -e'threads->create(sub{})->join() for 1..100' $ tperl -e'threads->create(sub{})->join() for 1..100' CPU 0.680 secs (CPU 0.256 secs) # Load the data late so also no data is copied: $ tperl -e'BEGIN{threads->create(sub{})->join() for 1..100}my @x=(1.. +35_000)' CPU 0.728 secs (CPU 0.268 secs) $ tperl -e'BEGIN{threads->create(sub{})->join() for 1..100}my @x;shar +e(@x);@x=(1..35_000)' CPU 0.840 secs (CPU 0.364 secs) # Overhead of copying, whether shared() or not: $ tperl -e'my @x;share(@x);@x=(1..35_000);threads->create(sub{})->joi +n() for 1..100' CPU 2.272 secs (CPU 1.316 secs) $ tperl -e'my @x=(1..35_000);threads->create(sub{})->join() for 1..10 +0' CPU 3.304 secs (CPU 3.384 secs)

    What threads::shared prevents from being copied to each new thread at thread creation time is the particular data for each element of the array. But you can see that it still includes much of the overhead (between 1/3 and 1/2 in the examples above) because it must copy the essentially-tied array (that uses C-based 'get' and 'set' accessors rather than the Perl accessors of actually-tied variables).

    But if you actually expect the threads to make use of the shared data, then you can see the dramatic overhead of copying the data to each thread one-element-at-a-time with locking that threads::shared does:

    $ tperl -e'my @x=(1..35_000);threads->create(sub{for(@x){my$y=$_}})-> +join() for 1..100' CPU 4.324 secs $ tperl -e'my @x;share(@x);@x=(1..35_000);threads->create(sub{for(@x) +{my$y=$_}})->join() for 1..100' CPU 17.281 secs
    starting a new thread--a few milliseconds at most

    If I didn't know better, I might suspect that this is propaganda. It could certainly mislead somebody. As can be seen above, creating a new iThreads instance can trivially take tens times that long.

    Just for comparison, here is how extra data impacts the performance of fork():

    $ tperl -e'fork && exit for 1..100' CPU 0.044 secs (CPU 0.020 secs) $ tperl -e'my @x=(1..35_000);share(@x);fork && exit for 1..100' CPU 0.072 secs (CPU 0.072 secs) $ tperl -e'my @x=(1..35_000);for(1..100){if(fork){for(@x){my$y=$_};ex +it}}' CPU 0.084 secs

    And, as has always happened, every time I touch iThreads, here are some examples of how easy it is to run into stupid things:

    # Try to load the data late but mostly fail: $ tperl -e'threads->create(sub{})->join() for 1..100;my @x=(1..35_000 +)' CPU 2.048 secs (CPU 1.204 secs) # Note how easy it is to do it wrong and not share data but get the ov +erhead: $ tperl -e'my @x=(1..35_000);share(@x);print "($x[0])\n";threads->cre +ate(sub{})->join() for 1..100' Use of uninitialized value in concatenation (.) or string at -e line 1 +. () CPU 3.380 secs (CPU 2.128 secs)

    Update: There are a lot of numbers there. A nice summing-up is that sharing the data between multiple instances and actually using it in each instance (in the above example) is 200x (20,000%) slower using iThreads and threads::shared than when using native fork.

    - tye        

      Dumb is as dumb does.

Re^2: Is Using Threads Slower Than Not Using Threads?
by ysth (Canon) on Nov 07, 2010 at 09:45 UTC
    Unless you've shared @allips, then that array will be copied into the new thread.
    You still have no clue what it means that sharing uses perl's magic mechanism. Please educate yourself before spreading further FUD about how shared variables don't use significant memory for each thread.
    --
    A math joke: r = | |csc(θ)|+|sec(θ)|-||csc(θ)|-|sec(θ)|| |
    Online Fortune Cookie Search
    Office Space merchandise

      Oh dear. Think context!

      To be more explicit. Think of the context in which I said what I said. Think of the purpose of that.

      Just as newbies don't need to be appraised up front with the details of the inner working of the regex engine, they don't need to be appraised of the inner workings of threads. The point was simply that without having been shared, each thread will get an independent, unshared copy of the array.

      Your interjection is:

      1. unhelpful;

        to the OP;

      2. unwarranted;

        My understanding of the internals of the iThreads is:

        1. perfectly sound thanks;
        2. irrelevant in the context of the assistance I was trying to give the OP.

        You don't need to be Jeff Friedl to help people with their regex problems. And if you were him, it wouldn't be useful to go into the all the DFA/NFA blah blah details of the internals when responding.

        And it seems to me that if you aren't aware of the semantic differences between the copying of non-shared arrays; and the sharing of shared arrays, you're the one lacking understanding. But, I know you almost certainly are aware of those differences, which just confirms that your interjection is ...

      3. therefore, politically motivated.

        Not intended to help the OP, but simply to come out in support of an untenable position.

      Your time might be better spent where it would be most valuable. Eg. correcting some of the unnecessary extravagances of the current implementation.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        In our last disagreement you would not even so much as recognize my arguments about context, just completely ignoring anything I said. And there you were taking one phrase out of its qualifying context. Here I am making an objection to a complete thought that was part of your reply to the OP; yes, not one pertinent to your main point.

        Why am I doing so? Because you seem to have elected yourself defender of ithreads to the point where you attack even true and reasonable cautionary points about them with a variety of shenanigans designed more to conceal than expose truth. The truth is that ithreads are a very different beast than people coming from other languages may expect. They are slow to start and tend to cause memory bloat. Anywhere there is support for copy-on-write fork, fork and some variety of IPC is almost always a better choice.

        If that's politics, so be it. Don't bother responding, I'm not going to read it.

        --
        A math joke: r = | |csc(θ)|+|sec(θ)|-||csc(θ)|-|sec(θ)|| |
        Online Fortune Cookie Search
        Office Space merchandise

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://868710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2014-07-11 06:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (219 votes), past polls