length() and the Schwartzian transform

hazylife has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

My question is about the following code example, taken as is from Wikipedia:

@sorted = map  { $_->[0] }
          sort { $a->[1] <=> $b->[1] } # use numeric comparison
          map  { [$_, length($_)] }    # calculate the length of the s
+tring
               @unsorted;
[download]

The above is pure Perl, not some generic pseudocode, and what worries me is that length() doesn't seem like a particularly good example for illustrating the concept. It has always been my understanding that in Perl, length($var) just returns whatever value is stored in the corresponding internal SV structure (Devel::Peek calls this field "CUR"; non-string/non-PV scalars don't have this attribute, more on them below). And if it is immediately available, why cache it?

What if we're "length()-sorting" an array of numeric scalars?

#!/usr/bin/perl

use strict;
use Devel::Peek;

my @arr = (22, 5.555, 1, 4444, 333);

print Dump(\@arr, @arr+0), "\n\n\n";
# at this point, there are 4 IVs and one NV in our array;
# naturally, none of the elements have their length ("CUR") field set

my ($retval, $just_once);

my @sorted = sort {
    # does this alter the internal structures of our actual
    # array elements, turning IVs and NVs into PVIVs/PVNVs?
    $retval = length $a <=> length $b;

    print Dump \@arr, @arr+0 if !$just_once++;
    # of course it does!

    $retval;
} @arr;

print "\nsorted = @sorted\n";
[download]

I might be missing something here, so please correct me if I'm wrong.

Comment on length() and the Schwartzian transform Select or Download Code

Replies are listed 'Best First'.
Re: length() and the Schwartzian transform by RMGir (Prior) on Feb 27, 2014 at 13:12 UTC
Looks like you're right - plain sort beats the heck out of ST in this case (and would do the same to GRT, I'm sure) - length is just too trivial. Although as the num-* cases below show, it turns out that length is more expensive for data that isn't pre-stringified, at least on the perl 5.14.2 I tested. #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all :hireswallclock); foreach my $arraySize(10,100,1_000,10_000,100_000) { my @numdata=map rand, 0..$arraySize; my @strdata=map {sprintf "%d",$_} @numdata; print "========== ARRAY SIZE: $arraySize\n"; cmpthese( -1, { 'str-ST' => sub { my @arr = @strdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'str-sort' => sub { my @arr = @strdata; @arr = sort {length($a) <=> length($b)} @arr; }, 'num-ST' => sub { my @arr = @numdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'num-sort' => sub { my @arr = @numdata; @arr = sort {length($a) <=> length($b)} @arr; }, } ); } [download] Results: $ perl benchmark-sort.pl ========== ARRAY SIZE: 10 Rate num-ST num-sort str-ST str-sort num-ST 5119/s -- -9% -77% -90% num-sort 5598/s 9% -- -75% -89% str-ST 22719/s 344% 306% -- -55% str-sort 50539/s 887% 803% 122% -- ========== ARRAY SIZE: 100 Rate num-ST num-sort str-ST str-sort num-ST 534/s -- -13% -82% -92% num-sort 615/s 15% -- -79% -91% str-ST 2904/s 444% 372% -- -58% str-sort 6854/s 1184% 1014% 136% -- ========== ARRAY SIZE: 1000 Rate num-ST num-sort str-ST str-sort num-ST 51.8/s -- -16% -82% -93% num-sort 61.4/s 18% -- -79% -92% str-ST 290/s 460% 373% -- -61% str-sort 741/s 1330% 1107% 155% -- ========== ARRAY SIZE: 10000 Rate num-ST num-sort str-ST str-sort num-ST 5.13/s -- -16% -82% -93% num-sort 6.15/s 20% -- -78% -92% str-ST 28.2/s 449% 358% -- -62% str-sort 73.7/s 1336% 1099% 162% -- ========== ARRAY SIZE: 100000 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter num-ST num-sort str-ST str-sort num-ST 1.93 -- -13% -81% -93% num-sort 1.68 15% -- -78% -92% str-ST 0.364 432% 363% -- -61% str-sort 0.142 1258% 1083% 155% -- Mike	[reply] [d/l]
Re^2: length() and the Schwartzian transform by tobyink (Canon) on Feb 27, 2014 at 14:54 UTC
The following one-liner neatly illustrates why `length` is faster on strings than numbers... `perl -MDevel::Peek -e'Dump($_) for "42", 42'` [download] `use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name`	[reply] [d/l] [select]
Re^2: length() and the Schwartzian transform by hazylife (Monk) on Feb 27, 2014 at 13:46 UTC
Wow you're quick. Thanks for the benchmark. I guess 'num-sort' is not that much faster than 'num-ST' because in both cases there's still one (but only one!) round of stringification being performed per sub call (explicit pre-stringification in num-ST and implicit, sort of in-place stringification in num-sort).	[reply]
Re: length() and the Schwartzian transform by Anonymous Monk on Mar 06, 2014 at 06:06 UTC
It seems to me that the point of the example is to show how ST works as simply as possible, NOT to show its speed or lack thereof ...	[reply]
Re^2: length() and the Schwartzian transform by hazylife (Monk) on Mar 06, 2014 at 09:33 UTC
NOT to show its speed or lack thereof My main issue with that example is not that it's slow, but that it's misleading. For string scalars, Perl's `length` returns a value that has already been calculated (and cached) elsewhere. It's just a dumb "accessor method" for that internally stored value, which makes caching it pointless.	[reply] [d/l]

Back to Seekers of Perl Wisdom