http://www.perlmonks.org?node_id=289649


in reply to Algorithm::Treap

One question and one suggestion.

The question: Why would I use a Treap and what benefits (and costs) would result as compared to using a normal hash for that (those) same applications?

The Suggestion: If you get around to modifying the interface to take a single custom comparator rather than the two as now (which I think makes a lot of sense), then you might also consider making the standard alpha comparator (cmp) a settable option in place of the default numeric comparator. That is to say, have a new()/tie() time option that can be set to indicate that the caller wants an alpha-sort order used rather than a numeric and then use cmp rather than <=> internally to the module rather than requiring the user to supply a comparator function that does this.

The main reason is that callbacks can be expensive. The best demonstration of this is the GRT. Even though performing a numeric sort using a GRT requires stringifying the numeric (part of the) values to be sorted, into numerically orderable strings (ie. fixed width with leading zeros), this pre-sort overhead is more than compensated for by performance gained by avoiding the need to callback into user code for the comparisons during the sort. I think that the overhead of an if or trinary conditional in the module code to determine which operator to use when inserting and/or balancing (is that the right term with a Treap?) would be easily offset by the benefit of not invoking a callback for the common cases of a 'trivial comparator', ie. numeric or alpha.

As an aside. I've long wished that perl had a numeric sort option built in (would one extra built-in called sortn be any great burden?), so that many cases where sorting numerically currently requires the overhead of a trivial callback or user block, or resorting to the GRT (or tyes recently posted variation on the theme) would become unnecessary.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Replies are listed 'Best First'.
Sorting blocks and efficiency (in Re to Re: Algorithm::Treap)
by bart (Canon) on Sep 08, 2003 at 09:23 UTC
    As an aside. I've long wished that perl had a numeric sort option built in (would one extra built-in called sortn be any great burden?), so that many cases where sorting numerically currently requires the overhead of a trivial callback or user block ...
    Not true. Perl does optimize some trivial sort code blocks, so if it sees something that looks like
    @sorted = sort { $a <=> $b } @list;
    it will turn it into a native numerical sort. Benchmarks will show that there is no significant speed difference between native sorting as string, and sorting numerically with this block: there simply is no callback overhead.

    I don't know the exact list, but it looks like at least normal and reversed numerical and string comparisons are in the list of blocks that get optimized away.

    Here are some benchmarks. See how close the results are, numerical sorting is typically even faster than string sortings due to a faster comparison operator. reversed_string and native are even as good as equally fast!

    For comparison I've also added a "messed up" callback block which doesn't use one of the idiomatic callback blocks, and doesn't get optimized away. See how efficiency falls back to one fifth of the optimized numerical sort, from 375 iterations / second, to a mere 75, and a third of the built-in native sort. The latter is the result you were expecting for the general case.

    #!/usr/bin/perl -w use strict; use Benchmark; my(@sorted, @unsorted); push @unsorted, rand 10000 for 1 .. 1000; timethese(-3, { native => sub { @sorted = sort @unsorted }, string => sub { @sorted = sort { $a cmp $b } @unsorted }, reversed_string => sub { @sorted = sort { $b cmp $a } @unsorted }, numerical => sub { @sorted = sort { $a <=> $b } @unsorted }, reversed_numerical => sub { @sorted = sort { $b <=> $a } @unsorted + }, messed_up=> sub { @sorted = sort { my $cmp = $a <=> $b; $cmp } @un +sorted }, });
    Results with a Perl 5.6.1:
    Benchmark: running messed_up, native, numerical, reversed_numerical, r +eversed_string, string, each for at least 3 CPU seconds... messed_up: 3 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU) @ 75 +.99/s (n=250) native: 3 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 24 +0.89/s (n=754) numerical: 3 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ 37 +5.16/s (n=1193) reversed_numerical: 2 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 C +PU) @ 379.87/s (n=1189) reversed_string: 4 wallclock secs ( 3.62 usr + 0.00 sys = 3.62 CPU) + @ 239.78/s (n=868) string: 3 wallclock secs ( 3.24 usr + 0.00 sys = 3.24 CPU) @ 22 +6.54/s (n=734)
      $ perl -MO=Deparse -e 'my @a = 0 .. 10; sort @a;' my(@a) = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); sort @a; $ perl -MO=Deparse -e 'my @a = 0 .. 10; sort { $a cmp $b } @a;' my(@a) = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); sort @a;

      So sort { $a cmp $b } @array is optimized away.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      Note: All code is untested, unless otherwise stated