Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Datastructures and compare functions

by Abigail-II (Bishop)
on Apr 08, 2003 at 10:11 UTC ( #248871=perlmeditation: print w/replies, xml ) Need Help??

One problem with making datastructure in Perl that are a little bit more advanced than searching for exact queries, is that it's hard to parameterize the compare function. Suppose you want to create a search tree. What kind of keys are you going to store? Numbers? Strings? But to compare them, you need different operators. And when dealing with data structures, you'll do lots of comparisons.

I know of three solutions to solve this problem:

  • Use a code ref that does the compare. The drawback is that for each compare that's been made, a function call needs to be made, and that's not exactly where Perl speed shines.
  • Use a single compare function in the code. If you are going to deal with different data, store objects that overload the compare function. This even has more overhead as described above.
  • Use eval to create customized functions. Here you pay a penalty of having to do some extra compiling, but you only pay that once.

Below is a benchmark that compares the first and third options, for heapsort. It shows the eval option to be a winner.


#!/usr/bin/perl use strict; use warnings; use Benchmark; my @heap; my $M = @ARGV ? shift : 1000; sub heapify; sub heapify { my ($idx, $cmp) = splice @_ => 0, 2; my $max = $idx; for my $try (2 * $idx + 1, 2 * $idx + 2) { $max = $try if $try < @heap && $cmp -> ($heap [$try], $heap [$ +max]); } return if $max == $idx; @heap [$idx, $max] = @heap [$max, $idx]; heapify $max, $cmp; } sub extract { return unless @heap; my $cmp = shift; my $min = $heap [0]; my $tmp = pop @heap; if (@heap) { $heap [0] = $tmp; heapify 0, $cmp; } return $min; } sub heapsort_cmp (&@) { (my ($cmp), @heap) = @_; for (my $i = int (@heap / 2); $i --;) { heapify $i => $cmp; } my @result; push @result => extract $cmp while @heap; @result; } my %cache; sub heapsort_eval ($@) { (my ($cmp), @heap) = @_; unless ($cache {$cmp}) { $cache {$cmp} = 1 + keys %cache; my $sub_heapify = "heapify_$cache{$cmp}"; my $sub_extract = "extract_$cache{$cmp}"; eval <<" --"; sub $sub_heapify; sub $sub_heapify { my (\$idx) = shift; my \$max = \$idx; for my \$try (2 * \$idx + 1, 2 * \$idx + 2) { \$max = \$try if \$try < \@heap && \$heap [\$try] $cmp \$heap [\$max]; } return if \$max == \$idx; \@heap [\$idx, \$max] = \@heap [\$max, \$idx]; $sub_heapify \$max; } sub $sub_extract { return unless \@heap; my \$min = \$heap [0]; my \$tmp = pop \@heap; if (\@heap) { \$heap [0] = \$tmp; $sub_heapify 0; } return \$min; } -- if ($@) {die "eval failed: $@\n"} } no strict 'refs'; my $sub_heapify = "heapify_$cache{$cmp}"; my $sub_extract = "extract_$cache{$cmp}"; for (my $i = int (@heap / 2); $i --;) { &$sub_heapify ($i); } my @result; push @result => &$sub_extract ($cmp) while @heap; @result; } our @data = 1 .. $M; for (my $i = @data; $i --;) { my $r = rand @data; @data [$r, $i] = @data [$i, $r]; } timethese -10 => { cmp => 'heapsort_cmp {$_ [0] < $_ [1]} @::data', eval => 'heapsort_eval "<", @::data' }; __END__ Benchmark: running cmp, eval for at least 10 CPU seconds... cmp: 11 wallclock secs (10.20 usr + 0.01 sys = 10.21 CPU) @ 3.23/ +s (n=33) eval: 11 wallclock secs (10.19 usr + 0.00 sys = 10.19 CPU) @ 4.61/ +s (n=47)

Replies are listed 'Best First'.
Re: Datastructures and compare functions
by thor (Priest) on Apr 08, 2003 at 12:42 UTC
    Is this the same Abigail that always says that you should run a benchmark for different values to see who the true winner is? No flame, just saying for saying...


      Oh, but I did. I just didn't include a whole set of results in the posts. On my machine, the 'eval' method starts winning with 3 values in the set. The bigger the set, the larger the "win" of the 'eval'.


Re: Datastructures and compare functions
by RMGir (Prior) on Apr 08, 2003 at 18:47 UTC
    Interesting. But you said:

    What kind of keys are you going to store? Numbers? Strings? But to compare them, you need different operators. And when dealing with data structures, you'll do lots of comparisons.

    Another alternative is to store numeric values so cmp always works, 0 padding numbers if necessary. How would that compare (no pun intended) for performance?

    It feels like this is a generalization of the type of problem that the GRT and ST sorting approaches solve.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://248871]
Approved by broquaint
Front-paged by valdez
[Corion]: :-D
Corion discovers a new shiny toy to try out over the (longish) weekend. Since I've done some more with websockets, maybe I'll try writing a webserver that implements hot-reloading of HTML(+CSS, +Javascript) in the browser. Edit the local file and ...
[Corion]: ... the browser(s) get a ping to a) refresh the page or b) reload "just" the changed parts, keeping the scroll position etc.
[Corion]: But I also have to look at how I can make WWW::Mechanize:: RemoteBrowser a reality, and how to make it safe from malicious content ;)
[Corion]: Part of wanting hot-reloading is that I think I've stumbled on a very simple set of CSS that I maybe want to use for a blog, but I want to try that out on mobile too, and I also want to add/modify it slightly so it has a header too...
[Corion]: ... and having hot reloading would make it easy to view the changes in multiple browser windows of different sizes, and on Android, simultaneously

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2018-04-26 10:45 GMT
Find Nodes?
    Voting Booth?