RFC: Implicit Parallelization Pragma

A while back, I left a node about concurrency in Perl. I ended that meditation with a question about how to improve concurrency in Perl.

This node recommends a pragma which gives certain guarantees to perl (note lowercase) that should allow implicit parallelization with minimal modification to existing code.

Implicit Parallelization in Functional Languages

Here's a quick-and-dirty introduction to functional languages, and how they can be parallelized by the implementation without modifying any code: Imagine a LISP function that takes an argument x and returns the result of 2 * x + 1. We can implement this as:

(defun func1 (x) (+ 1 (* 2 x)))
[download]

When you call this function, the implementation evaluates the inner-most expression first, each expression being reduced until the final result is computed:

(func1 3)      ; Function call
(+ 1 (* 2 3))  ; All occurrences of 'x' replaced with '3'
(+ 1 6)        ; Reduce
7              ; Reduce
[download]

With me so far? This should be familer to anyone who has worked through the first chapter of SICP.

Now we define another function that takes an argument x and computes (2 * x) + (3 * x) + 1:

(defun func2 (x) (+ 1 (* 2 x) (* 3 x)))
[download]

And then we call it, with the implementation replacing the variable as before:

(func2 3)              ; Function call
(+ 1 (* 2 3) (* 3 3))  ; All ocurrences of 'x' replaced with '3'
[download]

Stop right there. Do we evaluate (* 2 3) or (* 3 3) first? It doesn't matter, as long as the expression contains no side effects. One interesting effect of this is that the implementation can choose to execute these expressions in parallel without any additional help from the programmer.

(Of course, in this simple example, practical issues like communication time between CPUs are likely to make the parallel execution take longer. However, the same concepts can be applied to much more complex expressions.)

Implementing in Perl

Now consider this Perl map statement, which takes an array of strings and returns the length of each of those strings:

my @str_lengths = map { length } @strs;
[download]

This statement can be safely parallelized. Each execution of the function does not depend on the next one. The implementation could parallelize this statement on its own, effectively breaking it from O(n) to O(n/p), where p is the number of concurrent threads doing the processing.

The problem is that Perl provides no way to guarantee that a given piece of code has no side effects. If perl tries to parallelize a statement that has side effects, then the result is non-deterministic. Programmers don't like non-determinism.

Thus, we need a way to tell perl that we guarantee that there are no side effects in an expression (and it's on our head if we break that promise). That's where the threadsafe pragma would come in. This pragma would be lexically scoped:

my @str_lengths = do {
    use threadsafe;
    map { length } @strs;
};
[download]

This means the programmer has fined-grained control over what constructs are marked safe to parellelize. This is important so that safe and unsafe expressions can be in the same source code, and is also useful when there is no practical benefit to parallelization (as in the LISP example above).

Multi-core processors are becoming increasingly popular. Within a few years, even $300 computers at Best Buy will likely have such a beast. Languages and Programmers must start equipping themselves now to make use of these systems.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Comment on RFC: Implicit Parallelization Pragma Select or Download Code

Replies are listed 'Best First'.

Re: RFC: Implicit Parallelization Pragma
by Anonymous Monk on Jan 25, 2005 at 17:53 UTC

The problem is that Perl provides no way to guarantee that a given piece of code has no side effects.

fails the test

map {length} @array

    use Devel::Peek;
    my @arr = ("foo", 3);
    Dump($arr[-1]);
    my @bogus = map {length} @arr;
    Dump($arr[-1]);
    __END__
    SV = IV(0x8192bcc) at 0x8184234
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 3
    SV = PVIV(0x8184888) at 0x8184234
      REFCNT = 1
      FLAGS = (IOK,POK,pIOK,pPOK)
      IV = 3
      PV = 0x8189cf8 "3"\0
      CUR = 1
      LEN = 2
[download]

@arr

This is the same reason why threads in Perl take quite a lot of overhead, and that by default variables are copied (and not shared) between threads.

[reply]
[d/l]
[select]

Re^2: RFC: Implicit Parallelization Pragma

by Tanktalus (Canon) on Jan 25, 2005 at 18:09 UTC

All true, but in the spirit of DWIMery, this is actually a benign and irrelevant side effect. The creation of PV from a number is something that will automatically re-happen again later. And thus I think hardburn's code was correct. There are no real side effects in that they cannot be recreated later identically and automatically. For example, if the "use threadsafe" was in code that called a Memoize'd function, there may still be no real side effect. And only the caller can tell that, not perl.

my %cache;
sub get_from_cache
{
  my $var = shift;
  unless (exists $cache{$var})
  {
    $cache{$var} = __PACKAGE__->new($var);
  }
  $cache{$var};
}

my @foos = do {
  use threadsafe;
  map { get_from_cache($_)->foo() } @list;
};
[download]

Assuming that the object does not have non-local side effects (e.g., it reads from a data store, such as a database or the disk, but does not write to it, and the data store is considered locked from writing, or it merely calculates some stuff), such that removing it from the cache and re-creating it will create precisely the same object (especially the same foo()), then there are no technical side effects, and thus would be threadsafe, even if it's not threadsafe from a pure computer-science perspective.

I have exactly this situation in much of my code at work. I would love to see automatic parallelisation of some of what I do - as it is, I'm forced to fork() on unix, and not use threads at all on Windows, just to get consistant, supported speed boosts where I can. Yes, I lose what modifications child processes create in my objects - but since they are all cached and completely reproducable, there is no real side effect from the child code, except for what it writes to disk.

Of course, in order for this to work, we need a way to signify a subset of the code as serial - for example, when writing to a common logfile. Ok, I'll rephrase this: for it to work, I need that serialisation :-)

[reply]
[d/l]

Re^2: RFC: Implicit Parallelization Pragma

by hardburn (Abbot) on Jan 25, 2005 at 17:55 UTC

*sigh* We can always hope for Perl6 . . .

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

[reply]

Re^3: RFC: Implicit Parallelization Pragma

by TimToady (Parson) on Jan 25, 2005 at 20:00 UTC

$counter++

The main practical difference between hyperops and junctions in this regard is that hyperops are guaranteed to run to completion, whereas junctions can short circuit whenever they jolly well please and in any order. So I guess the effective constraints on junctional operators are a bit tighter than on hyperops. Generally speaking junctional operators should have no side effects, because you won't know how many of them will happen.

The autothreading of array indices in S9 is also intended to allow parallel execution where possible. The trend toward vector-capable processors has been evident for some time now, and we want Perl 6 to map naturally to those. Even if the Cell architecture doesn't take off, we all have rather powerful GPUs in our machines these days...

[reply]
[d/l]

Re^2: RFC: Implicit Parallelization Pragma

by mauke (Novice) on Jan 27, 2005 at 23:52 UTC

Indeed. And in Perl, this is very, very hard. How hard? Well, your own example fails the test. Even the map {length} @array may cause a side-effect.

Here's another example:

use warnings;
use strict;

{
    package Tmp;
    use overload '""' => \&str, fallback => 1;

    sub c {
        bless [$_[1]], $_[0]
    }

    sub str {
        print "Hello, $_[0][0]\n";
        "." x rand 10
    }
}

my @strs = map Tmp->c($_), qw[foo bar baz];
my @str_lengths = map { length } @strs;
print "$_\n" for @str_lengths;
[download]

length

STDOUT

rand

@strs

[reply]
[d/l]
[select]

Re: RFC: Implicit Parallelization Pragma
by Limbic~Region (Chancellor) on Jan 25, 2005 at 17:43 UTC

hardburn

threadsafe

my @str_lengths = map { length } @strs;

Cheers - L~R

[reply]
[d/l]

Re^2: RFC: Implicit Parallelization Pragma

by hardburn (Abbot) on Jan 25, 2005 at 17:47 UTC

Certainly it would have to guarentee ordered results, or it woud be very limited in usefulness.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

[reply]

Re^3: RFC: Implicit Parallelization Pragma

by Limbic~Region (Chancellor) on Jan 25, 2005 at 18:09 UTC

hardburn

Get the length of all words in the list
Look each word up in a dictionary
Count the number of syllables in a word

In other words, the elapsed time becomes close to the length for the longest pole in the tent. Some time is also spent ordering the results as they become available. Of course, I may be way out in left field here.

Cheers - L~R

[reply]

Re^4: RFC: Implicit Parallelization Pragma

by hardburn (Abbot) on Jan 26, 2005 at 14:22 UTC

Re: RFC: Implicit Parallelization Pragma
by Jenda (Abbot) on Jan 25, 2005 at 18:35 UTC

The pragma is supposed to be lexical? What would

@foo = map {use threadsafe; $_ = foo($_)} @bar;
[download]

Let's assume first that the {use threadsafe; ...} means that the contents of this block may be parallelized. No matter what subs do you call within. The programmer would have to be very very carefull then and it would take him ages to validate each and every subroutine that might get called. And the bigger the task (and thus the better candidate for parallel execution) the harder would it be to validate it. So I doubt people would use this often, if at all.

What about if the use threadsafe; mean just that the code in the block is threadsafe, but would not say anything about the subroutines. And the interpreter would be alowed to parallelize the execution only if also all subroutines used in the block are marked threadsafe. The problem then is ... who's going to mark them? And if I mark a subroutine threadsafe, does it mean I've just said even the subroutines called from within are threadsafe? What if the implementation of a subroutine I use from a module changes and it no longer is? Or does it mean that the compiler should check the threadsafety of the subroutines used within the one it just compiles and either die or warn and ignore the mark if it uses a subroutine that's not threadsafe? The second is of course much safer, but requires changes in other people's modules!

Jenda
We'd like to help you learn to help yourself
Look around you, all you see are sympathetic eyes
Stroll around the grounds until you feel at home
-- P. Simon in Mrs. Robinson

[reply]
[d/l]

Re^2: RFC: Implicit Parallelization Pragma

by hardburn (Abbot) on Jan 25, 2005 at 20:02 UTC

Let's assume first that the {use threadsafe; ...} means that the contents of this block may be parallelized.

This is what I intended.

The programmer would have to be very very carefull then and it would take him ages to validate each and every subroutine that might get called. And the bigger the task (and thus the better candidate for parallel execution) the harder would it be to validate it. So I doubt people would use this often, if at all.

If you want any concurrency without this, then you need to use fork() or threads. In that case, you still need to verify that your code doesn't have concurrency issues. A use threadsafe around important portions would take no more work than existing solutions, and is likely to take less since only code explicitly inside a threadsafe block would need to be checked.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

[reply]
[d/l]
[select]

Re: RFC: Implicit Parallelization Pragma
by Anonymous Monk on Jan 25, 2005 at 17:38 UTC

I would see the benefit of this fine-grained multi-threading only if we had massively parallelised CPU architecture as well (e.g. one of those funky asynchronous chips with many many pipelines). Not only that, the OS has to allow the individual process itself manage its own threads in a very very lightweight way.

Right now, however, even on multi-core processors, the OS still has to do a fair bit of work to establish and tear down a thread. Unless perl maintains a pool of threads that can be near-instantly reused for tiny tiny tasks, this is not too useful as a pragma.

Nonetheless, functional programming is a tasty tasty concept that I wish had more real-life applicability in my work.

[reply]

Re^2: RFC: Implicit Parallelization Pragma

by BrowserUk (Patriarch) on Jan 25, 2005 at 17:59 UTC

...the OS has to allow the individual process itself manage its own threads in a very very lightweight way.

Win32 already has this ability, they call them fibers.

As well as being ideal for creating pools of "units of execution", they also lend themselves to various other useful programming techniques.

For example, you can make any two pieces of code act as coroutines almost trivially.

What is lacking currently is a suitable Perlish API that would allow application programmers to make use of them. It might be possible to layer a Win32-only API on top of threads, but it would be so much better if, in typical perlish fashion, Perl provided an API for threads that abstracted the underlying implementation of any particular flavour and was flexible enough to allow extension for things like fibers on those platforms that support them.

Rather than the current model which follows one platforms outdated and restrictive model to the letter, and attempts to force-fit that model on all other platforms and implementations.

Examine what is said, not who speaks.

Silence betokens consent.

Love the truth but pardon error.

[reply]

Re^3: RFC: Implicit Parallelization Pragma

by frankied (Beadle) on Jan 25, 2005 at 21:14 UTC

Win32 already has this ability, they call them fibers.

Fibers will not help with parallelization. They are essentially a fancy incarnation of C's setjmp/longjmp functions. An application can change its execution context with fibers, but it cannot schedule fibers in parallel on multiple CPUs.

[reply]

Re^4: RFC: Implicit Parallelization Pragma

by BrowserUk (Patriarch) on Jan 26, 2005 at 00:25 UTC

Re^5: RFC: Implicit Parallelization Pragma

by Anonymous Monk on Jan 27, 2005 at 16:09 UTC

Some notes below your chosen depth have not been shown here

Re^2: RFC: Implicit Parallelization Pragma

by hardburn (Abbot) on Jan 25, 2005 at 17:42 UTC

These things are coming. The marketdroids from IBM/Intel/AMD/etc. are throwing around numbers like 512-core CPUs. Wheather that will come remains to be seen, but in any case, such a language feature would need to be implemented and tested now for it to be stable by the time multi-core processors are common.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

[reply]

Re: RFC: Implicit Parallelization Pragma: use parallizable
by diotalevi (Canon) on Jan 26, 2005 at 19:01 UTC

I'd prefer you named your parallelization pragma parallelizable or such. I have similar interests in this flag except mine would be for explictly altering the execution order of blocks to allow pipelines like grep { ... } map { ... } ... to present a low profile instead of having to fully compute each section before passing on to the next operation.

Using the parallizable pragma I can transform the following code. The benefit is I only have the minimum amount of intermediate results present at any given moment rather than the maximum.

{
  use parallizable;

  @ary
    = map { ... }  # block 1
    = grep { ... } # block 2
    = map { ... }  # block 3
    LIST;
}

{
  for ( LIST ) {
    for ( do { block 1 } ) {
      next if not do { block 2 };

      for ( do { block 3 } ) {
        push @ary, $_;
      }
    }
  }
}
[download]

[reply]
[d/l]
[select]

Back to Meditations