Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

length() and the Schwartzian transform

by hazylife (Monk)
on Feb 27, 2014 at 12:29 UTC ( #1076388=perlquestion: print w/ replies, xml ) Need Help??
hazylife has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

My question is about the following code example, taken as is from Wikipedia:

@sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } # use numeric comparison map { [$_, length($_)] } # calculate the length of the s +tring @unsorted;

The above is pure Perl, not some generic pseudocode, and what worries me is that length() doesn't seem like a particularly good example for illustrating the concept. It has always been my understanding that in Perl, length($var) just returns whatever value is stored in the corresponding internal SV structure (Devel::Peek calls this field "CUR"; non-string/non-PV scalars don't have this attribute, more on them below). And if it is immediately available, why cache it?

What if we're "length()-sorting" an array of numeric scalars?

#!/usr/bin/perl use strict; use Devel::Peek; my @arr = (22, 5.555, 1, 4444, 333); print Dump(\@arr, @arr+0), "\n\n\n"; # at this point, there are 4 IVs and one NV in our array; # naturally, none of the elements have their length ("CUR") field set my ($retval, $just_once); my @sorted = sort { # does this alter the internal structures of our actual # array elements, turning IVs and NVs into PVIVs/PVNVs? $retval = length $a <=> length $b; print Dump \@arr, @arr+0 if !$just_once++; # of course it does! $retval; } @arr; print "\nsorted = @sorted\n";

I might be missing something here, so please correct me if I'm wrong.

Comment on length() and the Schwartzian transform
Select or Download Code
Re: length() and the Schwartzian transform
by RMGir (Prior) on Feb 27, 2014 at 13:12 UTC
    Looks like you're right - plain sort beats the heck out of ST in this case (and would do the same to GRT, I'm sure) - length is just too trivial.

    Although as the num-* cases below show, it turns out that length is more expensive for data that isn't pre-stringified, at least on the perl 5.14.2 I tested.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all :hireswallclock); foreach my $arraySize(10,100,1_000,10_000,100_000) { my @numdata=map rand, 0..$arraySize; my @strdata=map {sprintf "%d",$_} @numdata; print "========== ARRAY SIZE: $arraySize\n"; cmpthese( -1, { 'str-ST' => sub { my @arr = @strdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'str-sort' => sub { my @arr = @strdata; @arr = sort {length($a) <=> length($b)} @arr; }, 'num-ST' => sub { my @arr = @numdata; @arr = map { $_->[0] } sort {$a->[1]<=>$b->[1]} map { [$_, length($_)]} @arr; }, 'num-sort' => sub { my @arr = @numdata; @arr = sort {length($a) <=> length($b)} @arr; }, } ); }
    Results:
    $ perl benchmark-sort.pl
    ========== ARRAY SIZE: 10
                Rate   num-ST num-sort   str-ST str-sort
    num-ST    5119/s       --      -9%     -77%     -90%
    num-sort  5598/s       9%       --     -75%     -89%
    str-ST   22719/s     344%     306%       --     -55%
    str-sort 50539/s     887%     803%     122%       --
    ========== ARRAY SIZE: 100
               Rate   num-ST num-sort   str-ST str-sort
    num-ST    534/s       --     -13%     -82%     -92%
    num-sort  615/s      15%       --     -79%     -91%
    str-ST   2904/s     444%     372%       --     -58%
    str-sort 6854/s    1184%    1014%     136%       --
    ========== ARRAY SIZE: 1000
               Rate   num-ST num-sort   str-ST str-sort
    num-ST   51.8/s       --     -16%     -82%     -93%
    num-sort 61.4/s      18%       --     -79%     -92%
    str-ST    290/s     460%     373%       --     -61%
    str-sort  741/s    1330%    1107%     155%       --
    ========== ARRAY SIZE: 10000
               Rate   num-ST num-sort   str-ST str-sort
    num-ST   5.13/s       --     -16%     -82%     -93%
    num-sort 6.15/s      20%       --     -78%     -92%
    str-ST   28.2/s     449%     358%       --     -62%
    str-sort 73.7/s    1336%    1099%     162%       --
    ========== ARRAY SIZE: 100000
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
             s/iter   num-ST num-sort   str-ST str-sort
    num-ST     1.93       --     -13%     -81%     -93%
    num-sort   1.68      15%       --     -78%     -92%
    str-ST    0.364     432%     363%       --     -61%
    str-sort  0.142    1258%    1083%     155%       --
    

    Mike

      Wow you're quick. Thanks for the benchmark.

      I guess 'num-sort' is not that much faster than 'num-ST' because in both cases there's still one (but only one!) round of stringification being performed per sub call (explicit pre-stringification in num-ST and implicit, sort of in-place stringification in num-sort).

      The following one-liner neatly illustrates why length is faster on strings than numbers...

      perl -MDevel::Peek -e'Dump($_) for "42", 42'
      use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: length() and the Schwartzian transform
by Anonymous Monk on Mar 06, 2014 at 06:06 UTC
    It seems to me that the point of the example is to show how ST works as simply as possible, NOT to show its speed or lack thereof ...
      NOT to show its speed or lack thereof
      My main issue with that example is not that it's slow, but that it's misleading. For string scalars, Perl's length returns a value that has already been calculated (and cached) elsewhere. It's just a dumb "accessor method" for that internally stored value, which makes caching it pointless.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1076388]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-12-27 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls