Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: RFC: Implicit Parallelization Pragma

by Anonymous Monk
on Jan 25, 2005 at 17:53 UTC ( [id://424964]=note: print w/replies, xml ) Need Help??


in reply to RFC: Implicit Parallelization Pragma

The problem is that Perl provides no way to guarantee that a given piece of code has no side effects.
Indeed. And in Perl, this is very, very hard. How hard? Well, your own example fails the test. Even the map {length} @array may cause a side-effect.
use Devel::Peek; my @arr = ("foo", 3); Dump($arr[-1]); my @bogus = map {length} @arr; Dump($arr[-1]); __END__ SV = IV(0x8192bcc) at 0x8184234 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 3 SV = PVIV(0x8184888) at 0x8184234 REFCNT = 1 FLAGS = (IOK,POK,pIOK,pPOK) IV = 3 PV = 0x8189cf8 "3"\0 CUR = 1 LEN = 2
Whoopsie. One of the elements of @arr changed, just by looking at it!

This is the same reason why threads in Perl take quite a lot of overhead, and that by default variables are copied (and not shared) between threads.

Replies are listed 'Best First'.
Re^2: RFC: Implicit Parallelization Pragma
by Tanktalus (Canon) on Jan 25, 2005 at 18:09 UTC

    All true, but in the spirit of DWIMery, this is actually a benign and irrelevant side effect. The creation of PV from a number is something that will automatically re-happen again later. And thus I think hardburn's code was correct. There are no real side effects in that they cannot be recreated later identically and automatically. For example, if the "use threadsafe" was in code that called a Memoize'd function, there may still be no real side effect. And only the caller can tell that, not perl.

    my %cache; sub get_from_cache { my $var = shift; unless (exists $cache{$var}) { $cache{$var} = __PACKAGE__->new($var); } $cache{$var}; } my @foos = do { use threadsafe; map { get_from_cache($_)->foo() } @list; };

    Assuming that the object does not have non-local side effects (e.g., it reads from a data store, such as a database or the disk, but does not write to it, and the data store is considered locked from writing, or it merely calculates some stuff), such that removing it from the cache and re-creating it will create precisely the same object (especially the same foo()), then there are no technical side effects, and thus would be threadsafe, even if it's not threadsafe from a pure computer-science perspective.

    I have exactly this situation in much of my code at work. I would love to see automatic parallelisation of some of what I do - as it is, I'm forced to fork() on unix, and not use threads at all on Windows, just to get consistant, supported speed boosts where I can. Yes, I lose what modifications child processes create in my objects - but since they are all cached and completely reproducable, there is no real side effect from the child code, except for what it writes to disk.

    Of course, in order for this to work, we need a way to signify a subset of the code as serial - for example, when writing to a common logfile. Ok, I'll rephrase this: for it to work, I need that serialisation :-)

Re^2: RFC: Implicit Parallelization Pragma
by hardburn (Abbot) on Jan 25, 2005 at 17:55 UTC

    *sigh* We can always hope for Perl6 . . .

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      That's the main reason Perl 6 has hyperoperators, and to a lesser extent the junctional operators. A pragma simply isn't specific enough. In the case of hyperoperators and junctions, we've said that it's erroneous to depend on any order of evaluation, which is a slightly weaker constraint than requiring no side effects. There can be side effects, as long as each element's side effects are independent of the other elements. Actually, even that's slightly overstated--the effects just need to be idempotent. Each branch could independently do $counter++ and it doesn't matter what order they happen in, at least for hyperops.

      The main practical difference between hyperops and junctions in this regard is that hyperops are guaranteed to run to completion, whereas junctions can short circuit whenever they jolly well please and in any order. So I guess the effective constraints on junctional operators are a bit tighter than on hyperops. Generally speaking junctional operators should have no side effects, because you won't know how many of them will happen.

      The autothreading of array indices in S9 is also intended to allow parallel execution where possible. The trend toward vector-capable processors has been evident for some time now, and we want Perl 6 to map naturally to those. Even if the Cell architecture doesn't take off, we all have rather powerful GPUs in our machines these days...

Re^2: RFC: Implicit Parallelization Pragma
by mauke (Novice) on Jan 27, 2005 at 23:52 UTC
    Indeed. And in Perl, this is very, very hard. How hard? Well, your own example fails the test. Even the map {length} @array may cause a side-effect.

    Here's another example:

    use warnings; use strict; { package Tmp; use overload '""' => \&str, fallback => 1; sub c { bless [$_[1]], $_[0] } sub str { print "Hello, $_[0][0]\n"; "." x rand 10 } } my @strs = map Tmp->c($_), qw[foo bar baz]; my @str_lengths = map { length } @strs; print "$_\n" for @str_lengths;

    As you can see, this time each call to length prints something to STDOUT and calls rand, which modifies global state. This is another reason why perl can't optimize such a loop in the general case: it's very hard to prove that @strs isn't tied or contains objects with overloaded stringification.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://424964]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-03-19 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found