Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Pass by ref vs alias : why does scalar size matter?

by clinton (Priest)
on May 01, 2008 at 21:26 UTC ( #684005=perlquestion: print w/ replies, xml ) Need Help??
clinton has asked for the wisdom of the Perl Monks concerning the following question:

As I understand it, passing an argument to a subroutine allows you to work on the alias within that sub, without copying the value, eg:

sub reduce_whitespace { $_[0] =~ s/\s+/ /g } $text = 'black cat'; reduce_whitespace($text); print $text; > black cat

so the size of the scalar shouldn't matter (unless you copy it :  my $string = $_[0] )

But, this doesn't seem to be the case. Look at my benchmark below, which compares using an alias with pass-by-reference. Referencing and dereferencing has a cost, so I would expect it to be slower. However, as the string grows in length, the ref version wins over the alias version:

Benchmark code

use Benchmark qw(cmpthese); use strict; use warnings; my $original; $original.= chr(int(rand(128))) for 1..1000; my $add = $original; for (1..20) { print "\n","Length of string: ",length($original),"\n"; cmpthese (1000000,{ ref => sub { my $new = $original; by_ref(\$new) }, alias => sub { my $new = $original; by_alias($new) }, }); $original.=$add; } sub by_alias { $_[0] =~ s/\s+//; } sub by_ref { ${ $_[0] } =~ s/\s+//; }

Results

Length of string: 1000 Rate alias ref alias 1098901/s -- 36% ref 806452/s -27% -- Length of string: 2000 Rate alias ref alias 909091/s -- 20% ref 757576/s -17% -- Length of string: 3000 Rate alias ref alias 751880/s -- 4% ref 724638/s -4% -- Length of string: 4000 Rate alias ref alias 653595/s -- -5% ref 689655/s 6% -- Length of string: 5000 Rate alias ref alias 571429/s -- -12% ref 649351/s 14% -- Length of string: 6000 Rate alias ref alias 497512/s -- -18% ref 606061/s 22% -- Length of string: 7000 Rate alias ref alias 450450/s -- -23% ref 581395/s 29% -- Length of string: 8000 Rate alias ref alias 409836/s -- -23% ref 534759/s 30% -- Length of string: 9000 Rate alias ref alias 369004/s -- -23% ref 476190/s 29% -- Length of string: 10000 Rate alias ref alias 331126/s -- -21% ref 420168/s 27% -- Length of string: 11000 Rate alias ref alias 299401/s -- -22% ref 381679/s 27% -- Length of string: 12000 Rate alias ref alias 275482/s -- -22% ref 353357/s 28% -- Length of string: 13000 Rate alias ref alias 254453/s -- -22% ref 325733/s 28% -- Length of string: 14000 Rate alias ref alias 233645/s -- -23% ref 304878/s 30% -- Length of string: 15000 Rate alias ref alias 210526/s -- -25% ref 282486/s 34% -- Length of string: 16000 Rate alias ref alias 192678/s -- -27% ref 265252/s 38% -- Length of string: 17000 Rate alias ref alias 181488/s -- -29% ref 255754/s 41% -- Length of string: 18000 Rate alias ref alias 169779/s -- -31% ref 245098/s 44% -- Length of string: 19000 Rate alias ref alias 158983/s -- -32% ref 233100/s 47% -- Length of string: 20000 Rate alias ref alias 151515/s -- -34% ref 228833/s 51% --

This is on Perl 5.8.8 on x86_64

Clint

Update Reformatted the results so that they are always presented in the same order, not in order of speed

Comment on Pass by ref vs alias : why does scalar size matter?
Select or Download Code
Re: Pass by ref vs alias : why does scalar size matter?
by mscharrer (Hermit) on May 01, 2008 at 22:08 UTC
    Very interesting thing...
    I'm getting similar results like you (v5.8.8 built for i486-linux-gnu-thread-multi) when I run your benchmark.
    But when I change now s/\s+//; to s/\s+/ /; then 'ref' is always ca. 30% slower than 'alias'! It looks that this might be dependent on the performed operation.

      When I tried your example of changing s/\s+// to s/\s+/ / I still saw the decline in speed in the alias version, albeit more slowly. The ref version went from -40% to -20%.

      I saw a similar (but varying) decline for these operations:

      $_[0] .= 'abcde'; vs ${ $_[0] } .= 'abcde'; $_[0] = substr( $_[0] , -500); vs ${ $_[0] } = substr(${ $_[0] } , -500);

      Presumably this has something to do with writing the value back to an alias?

      Update: This is just wrong. When retesting this morning, I realised that, while the differences in speed between the two got closer together as the string grew, they never actually switched. Instead, it was just that the string operation took a proportionally greater part of the time, and thus the cost of using references was relatively less.

      The only example I've been able to come up with where the two actually switched is the original version in Pass by ref vs alias : why does scalar size matter?

        Presumably this has something to do with writing the value back to an alias?
        Good idea. I was testing an operation which doesn't change the variable, m/\s+/, which results in alias only about 2% faster than ref, independent from length.

        One thing I liked to test was the influence the change of the string length by the operation has. When I changed s/\s+// to s/\s+//g the difference is again about 2% in favor of alias, otherwise it's about the same for s/\s/_/g which doesn't change the length.

Re: Pass by ref vs alias : why does scalar size matter?
by Jenda (Abbot) on May 01, 2008 at 23:48 UTC

    It doesn't for me. The alias seems to be quicker. BUT ...

    But the difference is fairly random. If I run the benchmark for a 10000 character long string 20 times, I get very different results. Here are the minimal and maximal rates:

    max_alias => 213675
    max_ref => 237530
    min_alias => 156495
    min_ref => 152439
    
    and again when I ran it once more
    max_alias => 237530
    max_ref => 228833
    min_alias => 160000
    min_ref => 164204
    
    As you can see, once the maximum and minimum is bigger for one, once for the other. All the speed difference is accidental.

    use Benchmark qw(cmpthese); use strict; use warnings; print "\n","Length of string: 10000\n"; for (1..20) { my $original; $original.= chr(int(rand(128))) for 1..1000; $original = $original x 10; cmpthese (100000,{ ref => sub { my $new = $original; by_ref(\$new) }, alias => sub { my $new = $original; by_alias($new) }, }); } sub by_alias { $_[0] =~ s/\s+//; } sub by_ref { ${ $_[0] } =~ s/\s+//; }
    my %data = (min_alias => 999999999, min_ref => 999999999); while (<>) { if (/^(\w+)\s+(\d+)\/s/) { my ($typ, $rate) = ($1,$2); $data{'min_'.$typ} = $rate if $data{'min_'.$typ} > $rate; $data{'max_'.$typ} = $rate if $data{'max_'.$typ} < $rate; } } print "$_ => $data{$_}\n" for sort keys %data;

    Update: I have perl v5.8.7 running on Windows Vista.

      That's not the result I get. I've run your code 10 times, and while, yes, the actual numbers varied, ref was consistently 28-34% faster than alias.

      Clint

Re: Pass by ref vs alias : why does scalar size matter?
by Corion (Pope) on May 02, 2008 at 11:22 UTC

    On a moderately loaded Solaris machine, alias keeps ahead of ref, but it seems that the string length dominates the speed, as both rates get lower and closer to each other with no signs of one overtaking the other:

    Length of string: 1000 Rate ref alias alias 141243/s 58% -- ref 89366/s -- -37% ... Length of string: 10000 Rate ref alias alias 45496/s 19% -- ref 38388/s -- -16% ... Length of string: 20000 Rate ref alias alias 25947/s 12% -- ref 23186/s -- -11%

    This is for

    This is perl, v5.8.4 built for sun4-solaris-64int (with 27 registered patches, see perl -V for more detail)

    On a different machine, also moderately loaded, I see practically the same, also with alias keeping (barely) ahead of ref as the string size increases:

    Length of string: 1000 Rate ref alias alias 404858/s 115% -- ref 187970/s -- -54% Length of string: 2000 Rate ref alias alias 367647/s 87% -- ref 196850/s -- -46% ... Length of string: 10000 Rate ref alias alias 220751/s 24% -- ref 177936/s -- -19% ... Length of string: 18000 Rate ref alias alias 168067/s 2% -- ref 165563/s -- -1% Length of string: 19000 Rate alias ref ref 162866/s 1% -- alias 161551/s -- -1% Length of string: 20000 Rate alias ref ref 162602/s 2% -- alias 160000/s -- -2%

    Here, the decrease in repetitions is more dramatic for alias as the string length increases, which I interpret as the runtime overhead of alias being really minuscule compared to the overhead of the other operations performed.

    This is on a HP-UX machine with Intel architecture:

    This is perl, v5.8.8 built for IA64.ARCHREV_0-thread-multi (with 33 registered patches, see perl -V for more detail) Copyright 1987-2006, Larry Wall Binary build 817.1 [268662] provided by ActiveState http://www.ActiveS +tate.com Built Sep 19 2006 13:53:03

    Interestingly, trying this with Strawberry 5.10.0, I see your reversal (Intel architecture):

    Rate ref alias ref 258065/s -- -46% alias 477783/s 85% -- ... Length of string: 10000 Rate alias ref alias 123533/s -- -25% ref 164528/s 33% -- ... Length of string: 20000 Rate alias ref alias 65509/s -- -42% ref 112867/s 72% --

    This is for

    This is perl, v5.10.0 built for MSWin32-x86-multi-thread

    Trying on the same machine with Strawberry 5.8.8, I get

    Length of string: 1000 Rate ref alias alias 511771/s 53% -- ref 335121/s -- -35% Length of string: 2000 Rate ref alias alias 383289/s 28% -- ref 299043/s -- -22% Length of string: 3000 Rate ref alias alias 318471/s 14% -- ref 279486/s -- -12% Length of string: 4000 Rate ref alias ref 256016/s -- -2% alias 262329/s 2% -- Length of string: 5000 Rate alias ref alias 219154/s -- -7% ref 236183/s 8% -- ... Length of string: 10000 Rate alias ref alias 121448/s -- -29% ref 170707/s 41% -- ... Length of string: 20000 Rate alias ref alias 67801/s -- -39% ref 111694/s 65% --

    This was with:

    This is perl, v5.8.8 built for MSWin32-x86-multi-thread

    So, at least judging from the limited set of versions and platforms, things point in the direction of Windows or MSVC to be responsible for the dramatic slowdown although I'm at a loss as to what part might be causing the slowdown.

      So, at least judging from the limited set of versions and platforms, things point in the direction of Windows or MSVC to be responsible for the dramatic slowdown although I'm at a loss as to what part might be causing the slowdown.

      Not only Windows, because I'm running mine on Linux:

        This would leave "x86" architecture as the only common thing. My "weird" Perls are running under 32-bit Windows x86 (Intel Celeron 2.8 GHz). You are seeing this under 64-bit Linux (AMD?).

        But the HP-UX machine also is a 64-bit x86 machine (Intel Xenon Server CPUs), and it doesn't exhibit the problematic behaviour.

Re: Pass by ref vs alias : why does scalar size matter?
by Anonymous Monk on Feb 04, 2009 at 07:38 UTC
    This seems to be exposing some weird edge cases in perl. I added some extra alias/reference tests like these:
    sub by_alias { $_[0] =~ s/\s+//; } sub by_alias2 { for ($_[0]) { s/\s+//; } } sub by_alias3 { ${\$_[0]} =~ s/\s+//; } sub by_ref { ${ $_[0] } =~ s/\s+//; } sub by_ref2 { for (${ $_[0] }) { s/\s+//; } }
    I then bumped up the string size so it went in 10k increments rather than 1k, and reduced the loop count. There's a weird effect occuring when the string size switches between 130k -> 140k.
    Length of string: 130000 Rate alias ref2 alias2 ref alias3 alias 17544/s -- -67% -67% -68% -68% ref2 52632/s 200% -- -0% -5% -5% alias2 52632/s 200% 0% -- -5% -5% ref 55556/s 217% 6% 6% -- 0% alias3 55556/s 217% 6% 6% 0% -- Length of string: 140000 Rate alias3 alias2 ref2 ref alias alias3 1376/s -- -0% -0% -0% -92% alias2 1377/s 0% -- -0% -0% -92% ref2 1379/s 0% 0% -- -0% -92% ref 1381/s 0% 0% 0% -- -92% alias 16393/s 1092% 1090% 1089% 1087% --
    That's just bizarre. It seems for strings < 130k, something strange is occuring with @_ aliasing, and for strings > 130k, for () aliasing and references are going haywire. Very odd indeed. This is debian perl, v5.8.8 built for i486-linux-gnu-thread-multi

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://684005]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-07-29 01:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls