Considering hash slices, past, present and future

It all started out innocently enough. I dig hash slices. They offer a compact syntax for representing bulk parallel transfers in hashes. If you don't know what hash slices are, here is a quick example. Consider the difference between:

#! /usr/bin/perl -w
use strict;

my %slice;
@slice{ qw/alpha bravo/ } = ( 'foo', 'bar' );

my %assign;
$assign{alpha} = 'foo';
$assign{bravo} = 'bar';
[download]

These two assignment blocks to %slice and %assign are equivalent. My intuition told me that the hash slice form would be faster. Benchmarks showed me otherwise...

I first created a benchmark with a two-element assignment and a 16-element assignment, with varying-length scalars. I was shocked to see that the hash slice method was 50% slower! I then expanded the tests to include 32-element and 64-element assignments, and each time the hash slice assignment was fifty percent slower.

% perl bench-hslice 100000
Benchmark: timing 100000 iterations of Assign_02, Assign_16, Assign_32
+, Assign_64, Slice_02, Slice_16, Slice_32, Slice_64...
 Assign_02:  1 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU)
 Assign_16:  7 wallclock secs ( 6.41 usr +  0.00 sys =  6.41 CPU)
 Assign_32: 14 wallclock secs (13.55 usr +  0.00 sys = 13.55 CPU)
 Assign_64: 26 wallclock secs (26.34 usr +  0.00 sys = 26.34 CPU)
  Slice_02:  2 wallclock secs ( 1.80 usr +  0.00 sys =  1.80 CPU)
  Slice_16: 10 wallclock secs ( 9.47 usr +  0.01 sys =  9.48 CPU)
  Slice_32: 19 wallclock secs (18.56 usr +  0.01 sys = 18.57 CPU)

  Slice_64: 39 wallclock secs (38.43 usr +  0.00 sys = 38.43 CPU)
[download]

Creating the 32- and 64-element hash slice assignments brought to light the whole reason why I starting meditating on this. The fact is, for a small number of elements in a hash slice assignment, creative use of whitespace lets you line up the assigner and the assignee, which leads to better comprehension and stamps out a possible source of errors. Consider:

  my %slice;
  @slice{ qw/alpha   bravo charlie delta/ } =
        (    $omega, $x,   $brown, $plane );
[download]

The big flaw with this approach is that it breaks down when the character length of the list of assignments exceeds your usual recommendations for line limits (which, in my case, is around 110 chars, but other shops might limit that to 80).

What would be really nice is to allow some kind of syntax along the lines of:

  my %slice;
  @slice{
    'alpha' = $omega,
    'bravo' = $x,
    # ...
  };
[download]

But that doesn't compile (Can't modify constant item in scalar assignment). But wait! That code looks awfully like a standard hash constructor. What if we used the => sugar comma?

  my %slice;
  @slice{
    alpha => $omega,
    bravo => $x,
    # ...
  };
[download]

That actually compiles, although it doesn't do what we want: perl spits out "Useless use of hash slice in void context" and doesn't modify %slice. Bummer. That's the crux of the problem: hash slice assignments do not scale up well for more than half a dozen or so elements, if you consider notational clarity as the most important issue. A minor issue is that it's also slower, but I don't really care about that, for I perceive that the notational benefits outweigh the runtime cost.

But I was curious as to why it was that much slower. My first step was a step in the dark, and somewhat of a step backwards to boot, for I replaced @h{ qw/alpha bravo/ } = ... with my @h = qw/alpha bravo/; @h{ @h } = .... (Farewell to clarity). Surprisingly, the hash slice assignment became faster, about as fast as the flat assignment approach.

% perl bench-hslice2 100000
Benchmark: timing 100000 iterations of Assign_02, Assign_16, Assign_32
+, Assign_64, Slice_02, Slice_16, Slice_32, Slice_64...
 Assign_02:  1 wallclock secs ( 1.22 usr +  0.00 sys =  1.22 CPU)
 Assign_16:  7 wallclock secs ( 6.61 usr +  0.00 sys =  6.61 CPU)
 Assign_32: 13 wallclock secs (13.11 usr +  0.00 sys = 13.11 CPU)
 Assign_64: 25 wallclock secs (26.82 usr +  0.00 sys = 26.82 CPU)
  Slice_02:  0 wallclock secs ( 1.33 usr +  0.00 sys =  1.33 CPU)
  Slice_16:  7 wallclock secs ( 6.57 usr +  0.00 sys =  6.57 CPU)
  Slice_32: 13 wallclock secs (12.76 usr +  0.00 sys = 12.76 CPU)
  Slice_64: 26 wallclock secs (26.11 usr +  0.00 sys = 26.11 CPU)
[download]

I was doing all this with perl 5.005_03. I ran the same code on 5.6.1, and oh joy! the hash slice assignment (without the klugey @array) ran about as fast as the flat assignment approach.

% perl hslice 100000
Benchmark: timing 100000 iterations of Assign_02, Assign_16, Assign_32
+, Assign_64, Slice_02, Slice_16, Slice_32, Slice_64...
 Assign_02:  1 wallclock secs ( 0.68 usr +  0.00 sys =  0.68 CPU) @ 14
+7058.82/s (n=100000)
 Assign_16:  3 wallclock secs ( 3.99 usr +  0.00 sys =  3.99 CPU) @ 25
+062.66/s (n=100000)
 Assign_32:  9 wallclock secs ( 8.40 usr +  0.01 sys =  8.41 CPU) @ 11
+890.61/s (n=100000)
 Assign_64: 17 wallclock secs (16.61 usr +  0.03 sys = 16.64 CPU) @ 60
+09.62/s (n=100000)
  Slice_02:  1 wallclock secs ( 0.70 usr +  0.00 sys =  0.70 CPU) @ 14
+2857.14/s (n=100000)
  Slice_16:  4 wallclock secs ( 4.21 usr +  0.00 sys =  4.21 CPU) @ 23
+752.97/s (n=100000)
  Slice_32:  7 wallclock secs ( 8.30 usr +  0.00 sys =  8.30 CPU) @ 12
+048.19/s (n=100000)
  Slice_64: 18 wallclock secs (16.93 usr +  0.01 sys = 16.94 CPU) @ 59
+03.19/s (n=100000)
[download]

To see what was going on, it was time to look at the op-codes perl was generating. I'm not an expert in perlguts, but it is pretty straight forward to figure things out. For the flat assignment, the code is roughly the same between the two versions of perl used, but for hash slices, it is a different story:

% cat opcode-hslice
#! /usr/bin/perl -w
use strict;
my %h;
@h{ qw/alpha bravo/ } = ( 'foo', 'bar' );

% perl -MO=Terse opcode-hslice # 5.6.1
    BINOP (0x819f8e0) aassign [2]
        UNOP (0x81b3dc0) null [141]
            OP (0x81a7868) pushmark
            SVOP (0x819c720) const  PV (0x8131440) "foo"
            SVOP (0x819c740) const  PV (0x8131434) "bar"
        UNOP (0x81b3e00) null [141]
            OP (0x81a37e8) pushmark
            LISTOP (0x8126360) hslice
                OP (0x81a7850) pushmark
                LISTOP (0x8126060) list
                    OP (0x81a79e8) pushmark
                    SVOP (0x8126240) const  PV (0x812eb8c) "alpha"
                    SVOP (0x81262a0) const  PV (0x8124224) "bravo"
                OP (0x819d128) padhv [1]

% perl -MO=Terse opcode-hslice # 5.005_03
    BINOP (0x80d4f30) pp_aassign [3]
        UNOP (0x80d4e00) pp_null [141]
            OP (0x80d4e28) pp_pushmark
            SVOP (0x8144de8) pp_const  PV (0x80d1f30) "foo"
            SVOP (0x80d4de0) pp_const  PV (0x80d1f48) "bar"
        UNOP (0x80d4ee8) pp_null [141]
            OP (0x80d4f10) pp_pushmark
            LISTOP (0x8144da0) pp_hslice
                OP (0x8144dc8) pp_pushmark
                LISTOP (0x80d5520) pp_split [2]
                    PMOP (0x81360b8) pp_pushre /\s+/
                    SVOP (0x80d54e0) pp_const  PVIV (0x80d1f90) "alpha
+ bravo"
                    SVOP (0x80d5500) pp_const  IV (0x80d1f0c) 0
                OP (0x80d54c0) pp_padhv [1]
[download]

(Some output elided for brevity).

Here, the slowdown observed in 5.005_03 is easily explained. When the interpreter encounters @slice{ qw/alpha bravo/ } = ( 'foo', 'bar' );, it emits the run-time op code to split /\s+/ the qw/alpha bravo/ quoted word array, each and every time. In 5.6.1, this is performed once at compile time, hence the speed gain.

I'm just hoping now that in Perl 6, there'll be one single mother-of-all-opcodes that will take two lists (the hslice lvalues, and the assignment rvalues) and do the whole mess at C speeds. Then we will see a dramatic performance increase. And a nice notation for arbitrary length hash slices would be nice too.

`g r i n d e r`

Comment on Considering hash slices, past, present and future Select or Download Code

Replies are listed 'Best First'.
Re (tilly) 1: Considering hash slices, past, present and future by tilly (Archbishop) on Oct 23, 2001 at 17:00 UTC
If you want a syntax where things line up, just write a function: `# Takes a hash reference and a list of key/value pairs. # Assigns those keys to those values in the hash sub add_to_hash { my $h_ref = shift; while (@_) { my $key = shift; my $value = shift; $h_ref{$key} = $value; } }` [download] Now it is easy to get the syntax you want: `add_to_hash(\%my_hash, key1 => "value1", key2 => "value2", # ... );` [download] UPDATE Fixed typo pointed out by grinder.	[reply] [d/l] [select]
(bbfu) (syntax suga) Re2: (tilly) 1: Considering hash slices, past, present and future by bbfu (Curate) on Oct 26, 2001 at 07:14 UTC
Coincidentally, I was thinking just last week about writing a Tie::StdHash module that overloaded addition for just such an effect... I was curious, do you think that would be overkill for what amounts to just some syntatic sugar? Would it be better to just go with a simple function call, or is there perhaps some worth to be found in creating a whole module/class devoted to adding (and perhaps subtracting) hashes? bbfu Seasons don't fear The Reaper. Nor do the wind, the sun, and the rain. We can be like they are.	[reply]
Re: Considering hash slices, past, present and future by demerphq (Chancellor) on Oct 23, 2001 at 18:33 UTC
From a different post today i used the example of list assignment and scalar assignment as being less and more efficient respectively. `use Benchmark 'cmpthese'; my ($x,$y)=(1,2); cmpthese(-2, {list =>sub { ($x,$y)=($y,$x) #list assign, slow }, scalar=>sub{ #three lines. fast. my $tmp=$x; $x=$y; $y=$tmp } } ); __END__ Benchmark: running list, scalar, each for at least 2 CPU seconds... list: ( 2.34 usr + 0.00 sys = 2.34 CPU) @ 550148.10/s (n=1288 +997) scalar: ( 2.11 usr + 0.00 sys = 2.11 CPU) @ 878828.44/s (n=1854 +328) Rate list scalar list 550148/s -- -37% scalar 878828/s 60% --` [download] At bare minimum what you are doing with a hash slice is using list context. As the above shows list assignment is slower than the equivelent scalar assignments. The reason, once you think about how list assignment works is easy to understand. The right side of the assignment is evaluated and the results are placed into a temporary array/list. Then these results are poured into the variables on the left. This means that three temp variables (the list, and the two values) are created. In the scalar version only one temp variable is created. Thus the speed difference. It wouldn't suprise me that there isn't really a good way to make list assignment as fast as scalar assignment is, irrelevent of the implementation language used, because of the semantic implications of the operation and the wide number of circumstances it needs to work within. Yves -- You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)	[reply] [d/l]
Re: Considering hash slices, past, present and future by theorbtwo (Prior) on Oct 23, 2001 at 23:29 UTC
Hm. Let this serve as a reminder: In <5.6, qw// is slow. However, there's no reason to use qw most of the time that it's used. For example, code like `use Foo qw(:bar);` seems to be the de facto standard. But `use Foo ':bar';` works just as well, is shorter, faster (only trivialy so in this case), and IMHO clearer. Is this just cargo-cultism, or is there some deeper reason going on here? Thanks, James Mastros, Just Another Perl Scribe	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks