http://www.perlmonks.org?node_id=1216445


in reply to Re^3: Why should any one use/learn Perl 6?
in thread Why should any one use/learn Perl 6?

"hashes in Cuckoo are not thread safe in the sense that they might lose updates when being updated from multiple threads"

Oh, dear. Sounds like a rather significant deficiency compared to Perl, when we are discussing concurrency. (By the way, note that MCE does not require threads to parallelize.)

"Note that the Perl 5 solution to updating shared data structures requires tieing and locking."

Wrong. It does not.

Why can't you just be positive? Always so negative!

I'm negative about your project because it is still squatting on Perl's name, duh.


The way forward always starts with a minimal test.
  • Comment on Re^4: Why should any one use/learn Perl 6?

Replies are listed 'Best First'.
Re^5: Why should any one use/learn Perl 6?
by liz (Monsignor) on Jun 12, 2018 at 07:43 UTC
    "hashes in Cuckoo are not thread safe in the sense that they might lose updates when being updated from multiple threads" Oh, dear. Sounds like a rather significant deficiency compared to Perl, when we are discussing concurrency.

    Perl 6 has decided that it is not a good idea to shuffle the inherent problems of updating a data-structure like a hash from multiple threads at the same time, under the carpet. Tieing a hash with a Perl interface to make sure that all updates are done sequentially, is not a good step towards making a fully functional, and well performing threaded program without any bottlenecks. You, as a developer, need to be aware of the issues, and make adaptations to your program and/or the way you think about threaded programming.

    Think about writing your solutions as a pipeline, or using react whenever using an event driven model.

    In that sense, Perl 5 ithreads makes you a lazy programmer with everything being thread-local by default.

    "Note that the Perl 5 solution to updating shared data structures requires tieing and locking." Wrong. It does not.

    If I look at the code of MCE::Shared and MCE::Shared::Scalar, I do see things like a sub TIESCALAR, and &MCE::Shared::Scalar::new being bound to said TIESCALAR. That to me implies tieing. Or am I wrong?

    I'm negative about your project because it is still squatting on Perl's name, duh.

    I'm glad to hear that it's only the name you object to now.

      In that sense, Perl 5 ithreads makes you a lazy programmer

      Of course it does. Perl always encourages the three virtues.

      Perl 6 has decided that it is not a good idea to shuffle the inherent problems of updating a data-structure like a hash from multiple threads at the same time, under the carpet. Tieing a hash with a Perl interface to make sure that all updates are done sequentially, is not a good step towards making a fully functional, and well performing threaded program without any bottlenecks. You, as a developer, need to be aware of the issues, and make adaptations to your program and/or the way you think about threaded programming.

      Think about writing your solutions as a pipeline, or using react whenever using an event driven model.

      As I understand this, there are two points you are making: shared hashes in Perl 6 are faster but not thread safe, and hashes are a bad choice of data structure for parallel programming.

      In cases where shared data is a better choice (no judgement here), isn't this a step backwards towards pthreads? In Perl 5 you're sure that at least the underlying data structure can't be corrupted (each thread has it's own interpreter) but in Perl 6 you need to take specific steps to ensure that it is not corrupted (threads share the same interpreter). I would think that with Perl 6, even with locking, you would still have better performance here since you aren't copying data between interpreters.

      Surely there is some middle ground between the Perl 5 threading model, pthreads, and the Python/Ruby GIL?

      I'm sure there is some nuance that I'm missing here. You have a lot more experience with threading than me, so maybe I'm just oversimplifying it and missing the point?

      Update: I found this blog post that discusses it. Still digesting it

        Thank you for reminding me about that excellent blog post by Jonathan. It was a bit ranty, but it also was a direct result of similar questions I was asking at that time. :-)

        At the moment, MoarVM has a potential for crashing when more than one thread is adding a key to a hash at the same time. This is a known issue and still being debated on how to be solved.

        So, to work-around the possibility of crashes, one should make sure that the hash already has all of the possible keys before starting to do the parallel run. If rewritten your code to be more idiomatic Perl 6:

        constant RANGE = 10_000; my %hash = ^RANGE Z=> 0 xx RANGE; await do for 1..10 { start { %hash{ (^RANGE).pick }++ for ^100_000; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 91.7143

        I think the constant RANGE = 10_000 is rather self-explanatory. The next line may be somewhat harder to grasp: it fills the hash %hash with a list of Pairs generated by zipping (Z) a range (^RANGE, which is short for 0 .. RANGE - 1) with a sequence of 10_000 zeroes (0 xx RANGE) using the fat-comma operator (=>)

        Then we execute %hash{ ^RANGE .pick }++ for ^100_000 in 10 different threads. The (^RANGE).pick picks a random value from the range of 0 .. RANGE - 1.

        The results are then shown by directly interpolating code inside a string: you can use curly braces for that.

        You can use the .sum method to get a sum of values, and .elems directly on the hash to find out the number of elements

        I haven't been able to get this version to crash: however, it is still not threadsafe for the reasons that Jonathan explains so well in his blog post.

        If one uses the map/reduce idiom, the code would look like this:

        constant RANGE = 10_000; my %hash is Bag = await do for 1..10 { start { my %h; %h{ ^RANGE .pick }++ for ^100_000; %h; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 100

        You will note that now each thread has its own hash that can get updated without any problems. The result is a list of 10 hashes that are merged into a single hash with Bag semantics. (see Sets, Bags and Mixes for more information on Bags).

        A Bag is basically an object hash (so the keys are not necessarily strings) that only accepts positive integers as values. Initialization of a Bag accepts and merges anything that looks like a Pair or a list of Pairs (which is basically what a hash is in that context).

        So that would be the idiom to use safely. Hope this made sense :-)

      Greetings liz, MCE::Shared provides two interfaces OO and TIE. The OO interface does not involve TIE.

        Greetings marioroy.

        Indeed, I stand corrected. However, I think that 1nickt's point about it not doing any locking and solve the issue that Perl 6 has when updating a container from multiple threads at the same time, is not true. From the MCE::Shared documentation:

        # Locking is necessary when multiple workers update the same # element. The reason is that it may involve 2 trips to the # shared-manager process: fetch and store in this case. $mutex->enter( sub { $cnt += 1 } );

        This implies to me that MCE::Shared suffers from exactly the same issues that Perl 6 has and which Jonathan so aptly describes in blog post. Or am I missing something?