RFC extending Benchmark.pm to facilitate CODEHASHREF

I never liked the usage of the CODEHASHREF in Benchmark.

Often people start writing things like


sub name1 { 
   ...
}

sub name2 { 
   ...
}

cmpthese (
     -5, 
     {
        'name1' => \&name1,
        'name2' => \&name2,
     }
)
[download]

which isn't very DRY and makes experimenting with optimizations really cumbersome! (I hate it, every new function name has to be repeated 3 times in different locations...)

see also Re: Best method to diff very large array efficiently for another examle.

My idea is to put all subs into a dedicated package (defaultname "CMP" or so) and to automatically filter necessary name and coderefs.

{
  package CMP;
  sub name1 { 
     ...
  }

  sub name2 { 
     ...
  }
}

cmpthese (-5, pckg_subs("CMP") )
[download]

ATM I'm using a function pckg_subs() for this, not sure if it makes sense to extend the interface of cmpthese and timethese to directly accept a stash-ref like \%CMP::. š

Following a proof on concept, request for comments.

use strict;
use warnings;
use Benchmark qw/cmpthese/;
use Data::Dump qw/pp/;



{
  package CMP;
   

  my @arr_1 = map {rand 1e6} 8000;
  my @arr_2 = map {rand 1e6} 6000;
  
  sub hash_values_diff {
    my %diff3;
    @diff3{@arr_1} = @arr_1;
    delete @diff3{@arr_2};
    values %diff3 ;
  }
  

  sub hash_key_diff {
    my %diff3;
    @diff3{@arr_1} = ();
    delete @diff3{@arr_2};
    keys %diff3 ;
  }

  sub using_vec {
    my $vec = '';
    vec( $vec, $_, 1 ) = 1 for @arr_2;
    grep !vec( $vec, $_, 1 ), @arr_1;
  }

  sub hash_grep {
    my %arr_2_hash;
    undef @arr_2_hash{@arr_2};
    grep !exists $arr_2_hash{$_}, @arr_1;
  }

}




cmpthese(-5, pckg_subs() );


sub pckg_subs {
  my $pckg_name= shift // "CMP";

  my $stash = do {
    no strict 'refs';
    \ %{ "${pckg_name}::" };
  };
  
  # filter all subs from package
  my $codehashref;
  while (my ($name,$glob)= each %$stash) {
    if ( my $cref = *{$glob}{CODE} ) {
      print "$name:\t$glob\n";
      $codehashref->{$name}=$cref;
    }
  }
  return $codehashref; 
}
[download]

OUTPUT

/usr/bin/perl -w /tmp/diff.pl 
hash_grep:    *CMP::hash_grep
hash_key_diff:    *CMP::hash_key_diff
hash_values_diff:    *CMP::hash_values_diff
using_vec:    *CMP::using_vec
                     Rate   using_vec hash_values_diff hash_key_diff  
+ hash_grep
using_vec         22269/s          --             -85%          -90%  
+      -91%
hash_values_diff 148869/s        568%               --          -31%  
+      -40%
hash_key_diff    215078/s        866%              44%            --  
+      -13%
hash_grep        247455/s       1011%              66%           15%  
+        --
[download]

The idea code be extended with sub-attribute ':compare' or ':nocompare' to additionally mark functions which are supposed to be compared or not.

Cheers Rolf

( addicted to the Perl Programming Language)

update

š) is it possible to tell if a hashref belongs to a stash?

... well at least I could parse %main or pass the packagename directly =)

Comment on RFC extending Benchmark.pm to facilitate CODEHASHREF Select or Download Code

Replies are listed 'Best First'.
Re: RFC extending Benchmark.pm to facilitate CODEHASHREF by BrowserUk (Patriarch) on Nov 26, 2013 at 09:36 UTC
If you recognise a few realities, benchmarks get a lot easier and far more accurate: Perl is a dynamic language; eval is a dishinguishing characteristic and a useful tool. Benchmarks are not production code; Failure to adhere to any theoretical best practices or production code standards, does not prevent them from serving their purpose. The perl feature that allows the non-block forms of map & grep is very powerful. The benchmark module wraps the code(ref) supplied, in an extra layer of subroutine. It does this in an attempt to allow it to subtract the overhead added by the benchmarking process, from the timing of the code being tested. All of this good work is completely negated when: you write a benchmark that first wraps the code being tested, in a named subroutine; then wraps a call to that named subroutine, in an anonymous subroutine in order to pass a reference to the benchmark module. (Have they never heard of the `\&subname` syntax?) Thus, the timing produced by the module incorporate three levels of subroutine call, only one of which has been partially negated. This demonstrates the naivety of the author(s). If you write the benchmark in 1064279 like this: <Reveal this spoiler or all in this thread> Not only do you avoid the three repetitions of which you complain, you also avoiding adding the overhead of three levels of function call to each piece of code being tested; and thus produce a far more accurate timing of the important code. It is simple, clear, concise and DRY; and works now, just as it always has. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: RFC extending Benchmark.pm to facilitate CODEHASHREF by LanX (Saint) on Nov 27, 2013 at 02:13 UTC
> Thus, the timing produced by the module incorporate three levels of subroutine call, only one of which has been partially negated. This demonstrates the naivety of the author(s). WOW really? Oh man, thank you for telling me! Well, unfortunately I've never done this. > If you write the benchmark in 1064279 like this: Which is Kenosis code, not the code I wrote. (playing tricks again you funny little bastard? Ha ha ha ... yawn) The approach of hashes with `{ name1 => sub { }, name2 => sub { }, }` [download] is well known. but it's sufficiently different to easily introduce errors. Especially when copying existing code, things like comma-separation, indentation, positions of names, ... (see also Eily's comment). Secondly and more importantly the need to benchmark often comes if code has already been written and experimenting starts with cloned forms of `sub do_something { # not fast enough }` [download] then called `sub do_something_old { ... }` [download] or `sub do_something_1 { ... }` [download] And I don't think I'm alone, two other monks already asked about filtering some subs out of a package and/or ignoring imported subs. Now if you don't like a convenience module to facilitate this kind of benchmarking, better don't use it. Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^3: RFC extending Benchmark.pm to facilitate CODEHASHREF by BrowserUk (Patriarch) on Nov 27, 2013 at 02:25 UTC
Which is Kenosis code, not the code I wrote. (playing tricks again you funny little bastard ... yawn) As clearly identified in the post to which I linked, (so no "trick"). Instead you write crap like this. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: RFC extending Benchmark.pm to facilitate CODEHASHREF by LanX (Saint) on Nov 27, 2013 at 02:48 UTC
Re: RFC extending Benchmark.pm to facilitate CODEHASHREF by Eily (Monsignor) on Nov 26, 2013 at 00:27 UTC
To get a lighter syntax you can already do something like : `my @arr_1 = map {rand 1e6} 8000; my @arr_2 = map {rand 1e6} 6000; my $cmp = { hash_values_diff => sub { my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; values %diff3 ; }, using_vec => sub { my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; grep !vec( $vec, $_, 1 ), @arr_1; }, hash_grep => sub { my %arr_2_hash; undef @arr_2_hash{@arr_2}; grep !exists $arr_2_hash{$_}, @arr_1; } }; cmpthese(-3, $cmp);` [download] I do like the package approach more though, because there's no reference to a hash of refences to subs, which probably is quite unsettling for beginners. And even if they don't understand how it works, it's surely better if they at least understand the syntax to use it, instead of just copying an exemple with even weirder concepts than those they already have trouble understanding. If you go for the package solution, you might as well have it use parent YourBenchmark, so that you can call `CMP->benchmark(-5);`. And you may want to add filtering of some sort, because one might want to call a function imported from another package inside of the functions that should be benchmarked. I'm sure someone could have come up with a function using something out of List::AllUtils for the exemple you gave. Edit: and you already said that with using the attributes as the tool for selecting the benchmarked subs, my bad.	[reply] [d/l] [select]
Re^2: RFC extending Benchmark.pm to facilitate CODEHASHREF by LanX (Saint) on Nov 26, 2013 at 14:33 UTC
Hi Thanks for the input, pointing me to new use cases ! :) I'm pretty sure it's possible to exclude imported subs. And passing optional filter-conditions (like regex) operating on sub-names are a good idea. Combined with optional attributes all cases should be covered. Will show a proof of concept soon. : ) Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re: RFC extending Benchmark.pm to facilitate CODEHASHREF by tobyink (Canon) on Nov 26, 2013 at 08:54 UTC
`use Attribute::Benchmark; sub name1 :Benchmark { ... } sub name2 :Benchmark { ... }` [download] That is all. `use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name`	[reply] [d/l]
Re^2: RFC extending Benchmark.pm to facilitate CODEHASHREF by LanX (Saint) on Nov 26, 2013 at 13:30 UTC
So you liked the attribute idea? :) `Attribute-Benchmark =================== Created: 2013-11-26` [download] "Publish or perish"? ;) Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l]
Re^3: RFC extending Benchmark.pm to facilitate CODEHASHREF by tobyink (Canon) on Nov 26, 2013 at 21:01 UTC
To be honest, I can't remember reading that paragraph in your OP, though I may have skimmed it and absorbed it subconsciously. I was also thinking along the lines of Test::Class::MOP's `is testcase` trait and how mop's method traits are basically a variation on sub attributes. `use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name`	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks