http://www.perlmonks.org?node_id=1076388

hazylife has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

My question is about the following code example, taken as is from Wikipedia:

@sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } # use numeric comparison map { [$_, length($_)] } # calculate the length of the s +tring @unsorted;

The above is pure Perl, not some generic pseudocode, and what worries me is that length() doesn't seem like a particularly good example for illustrating the concept. It has always been my understanding that in Perl, length($var) just returns whatever value is stored in the corresponding internal SV structure (Devel::Peek calls this field "CUR"; non-string/non-PV scalars don't have this attribute, more on them below). And if it is immediately available, why cache it?

What if we're "length()-sorting" an array of numeric scalars?

#!/usr/bin/perl use strict; use Devel::Peek; my @arr = (22, 5.555, 1, 4444, 333); print Dump(\@arr, @arr+0), "\n\n\n"; # at this point, there are 4 IVs and one NV in our array; # naturally, none of the elements have their length ("CUR") field set my ($retval, $just_once); my @sorted = sort { # does this alter the internal structures of our actual # array elements, turning IVs and NVs into PVIVs/PVNVs? $retval = length $a <=> length $b; print Dump \@arr, @arr+0 if !$just_once++; # of course it does! $retval; } @arr; print "\nsorted = @sorted\n";

I might be missing something here, so please correct me if I'm wrong.

Replies are listed 'Best First'.
Re: length() and the Schwartzian transform
by RMGir (Prior) on Feb 27, 2014 at 13:12 UTC
    Looks like you're right - plain sort beats the heck out of ST in this case (and would do the same to GRT, I'm sure) - length is just too trivial.

    Although as the num-* cases below show, it turns out that length is more expensive for data that isn't pre-stringified, at least on the perl 5.14.2 I tested.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all :hireswallclock); foreach my $arraySize(10,100,1_000,10_000,100_000) { my @numdata=map rand, 0..$arraySize; my @strdata=map {sprintf "%d",$_} @numdata; print "========== ARRAY SIZE: $arraySize\n"; cmpthese( -1, { 'str-ST' => sub { my @arr = @strdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'str-sort' => sub { my @arr = @strdata; @arr = sort {length($a) <=> length($b)} @arr; }, 'num-ST' => sub { my @arr = @numdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'num-sort' => sub { my @arr = @numdata; @arr = sort {length($a) <=> length($b)} @arr; }, } ); }
    Results:
    $ perl benchmark-sort.pl
    ========== ARRAY SIZE: 10
                Rate   num-ST num-sort   str-ST str-sort
    num-ST    5119/s       --      -9%     -77%     -90%
    num-sort  5598/s       9%       --     -75%     -89%
    str-ST   22719/s     344%     306%       --     -55%
    str-sort 50539/s     887%     803%     122%       --
    ========== ARRAY SIZE: 100
               Rate   num-ST num-sort   str-ST str-sort
    num-ST    534/s       --     -13%     -82%     -92%
    num-sort  615/s      15%       --     -79%     -91%
    str-ST   2904/s     444%     372%       --     -58%
    str-sort 6854/s    1184%    1014%     136%       --
    ========== ARRAY SIZE: 1000
               Rate   num-ST num-sort   str-ST str-sort
    num-ST   51.8/s       --     -16%     -82%     -93%
    num-sort 61.4/s      18%       --     -79%     -92%
    str-ST    290/s     460%     373%       --     -61%
    str-sort  741/s    1330%    1107%     155%       --
    ========== ARRAY SIZE: 10000
               Rate   num-ST num-sort   str-ST str-sort
    num-ST   5.13/s       --     -16%     -82%     -93%
    num-sort 6.15/s      20%       --     -78%     -92%
    str-ST   28.2/s     449%     358%       --     -62%
    str-sort 73.7/s    1336%    1099%     162%       --
    ========== ARRAY SIZE: 100000
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
             s/iter   num-ST num-sort   str-ST str-sort
    num-ST     1.93       --     -13%     -81%     -93%
    num-sort   1.68      15%       --     -78%     -92%
    str-ST    0.364     432%     363%       --     -61%
    str-sort  0.142    1258%    1083%     155%       --
    

    Mike

      The following one-liner neatly illustrates why length is faster on strings than numbers...

      perl -MDevel::Peek -e'Dump($_) for "42", 42'
      use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name

      Wow you're quick. Thanks for the benchmark.

      I guess 'num-sort' is not that much faster than 'num-ST' because in both cases there's still one (but only one!) round of stringification being performed per sub call (explicit pre-stringification in num-ST and implicit, sort of in-place stringification in num-sort).

Re: length() and the Schwartzian transform
by Anonymous Monk on Mar 06, 2014 at 06:06 UTC
    It seems to me that the point of the example is to show how ST works as simply as possible, NOT to show its speed or lack thereof ...
      NOT to show its speed or lack thereof
      My main issue with that example is not that it's slow, but that it's misleading. For string scalars, Perl's length returns a value that has already been calculated (and cached) elsewhere. It's just a dumb "accessor method" for that internally stored value, which makes caching it pointless.