Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

RFC extending Benchmark.pm to facilitate CODEHASHREF

by LanX (Saint)
on Nov 25, 2013 at 22:30 UTC ( [id://1064322]=perlmeditation: print w/replies, xml ) Need Help??

Hi

I never liked the usage of the CODEHASHREF in Benchmark.

Often people start writing things like

sub name1 { ... } sub name2 { ... } cmpthese ( -5, { 'name1' => \&name1, 'name2' => \&name2, } )

which isn't very DRY and makes experimenting with optimizations really cumbersome! (I hate it, every new function name has to be repeated 3 times in different locations...)

see also Re: Best method to diff very large array efficiently for another examle.

My idea is to put all subs into a dedicated package (defaultname "CMP" or so) and to automatically filter necessary name and coderefs.

{ package CMP; sub name1 { ... } sub name2 { ... } } cmpthese (-5, pckg_subs("CMP") )

ATM I'm using a function pckg_subs() for this, not sure if it makes sense to extend the interface of cmpthese and timethese to directly accept a stash-ref like \%CMP::. ¹

Following a proof on concept, request for comments.

use strict; use warnings; use Benchmark qw/cmpthese/; use Data::Dump qw/pp/; { package CMP; my @arr_1 = map {rand 1e6} 8000; my @arr_2 = map {rand 1e6} 6000; sub hash_values_diff { my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; values %diff3 ; } sub hash_key_diff { my %diff3; @diff3{@arr_1} = (); delete @diff3{@arr_2}; keys %diff3 ; } sub using_vec { my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; grep !vec( $vec, $_, 1 ), @arr_1; } sub hash_grep { my %arr_2_hash; undef @arr_2_hash{@arr_2}; grep !exists $arr_2_hash{$_}, @arr_1; } } cmpthese(-5, pckg_subs() ); sub pckg_subs { my $pckg_name= shift // "CMP"; my $stash = do { no strict 'refs'; \ %{ "${pckg_name}::" }; }; # filter all subs from package my $codehashref; while (my ($name,$glob)= each %$stash) { if ( my $cref = *{$glob}{CODE} ) { print "$name:\t$glob\n"; $codehashref->{$name}=$cref; } } return $codehashref; }
OUTPUT
/usr/bin/perl -w /tmp/diff.pl hash_grep: *CMP::hash_grep hash_key_diff: *CMP::hash_key_diff hash_values_diff: *CMP::hash_values_diff using_vec: *CMP::using_vec Rate using_vec hash_values_diff hash_key_diff + hash_grep using_vec 22269/s -- -85% -90% + -91% hash_values_diff 148869/s 568% -- -31% + -40% hash_key_diff 215078/s 866% 44% -- + -13% hash_grep 247455/s 1011% 66% 15% + --

The idea code be extended with sub-attribute ':compare' or ':nocompare' to additionally mark functions which are supposed to be compared or not.

Cheers Rolf

( addicted to the Perl Programming Language)

update

¹) is it possible to tell if a hashref belongs to a stash?

... well at least I could parse %main or pass the packagename directly =)

Replies are listed 'Best First'.
Re: RFC extending Benchmark.pm to facilitate CODEHASHREF
by BrowserUk (Patriarch) on Nov 26, 2013 at 09:36 UTC

    If you recognise a few realities, benchmarks get a lot easier and far more accurate:

    1. Perl is a dynamic language;

      eval is a dishinguishing characteristic and a useful tool.

    2. Benchmarks are not production code;

      Failure to adhere to any theoretical best practices or production code standards, does not prevent them from serving their purpose.

    3. The perl feature that allows the non-block forms of map & grep is very powerful.
    4. The benchmark module wraps the code(ref) supplied, in an extra layer of subroutine.

      It does this in an attempt to allow it to subtract the overhead added by the benchmarking process, from the timing of the code being tested.

      All of this good work is completely negated when:

      • you write a benchmark that first wraps the code being tested, in a named subroutine;
      • then wraps a call to that named subroutine, in an anonymous subroutine in order to pass a reference to the benchmark module.

        (Have they never heard of the \&subname syntax?)

      Thus, the timing produced by the module incorporate three levels of subroutine call, only one of which has been partially negated. This demonstrates the naivety of the author(s).

    If you write the benchmark in 1064279 like this:

    use strict; use warnings; use Benchmark qw/cmpthese/; our @arr_1 = 0 .. 8e3; our @arr_2 = 2e3 .. 1e4; cmpthese -5, { OPdiff => q[ my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); ], OPdiffModified => q[ my %diff3; @diff3{@arr_1} = (); delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); ], OPdiff_undef => q[ my %diff3; undef @diff3{@arr_1}; delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); ], using_vec => q[ my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; my @diff = grep !vec( $vec, $_, 1 ), @arr_1; ], hash_grep => q[ my %arr_2_hash; undef @arr_2_hash{@arr_2}; my @diff = grep !exists $arr_2_hash{$_}, @arr_1; ], }; __END__ C:\test>1064178-b.pl Rate OPdiff hash_grep OPdiffModified OPdiff_undef + using_vec OPdiff 95.3/s -- -48% -52% -54% + -65% hash_grep 184/s 93% -- -8% -10% + -32% OPdiffModified 200/s 109% 8% -- -3% + -26% OPdiff_undef 205/s 115% 11% 3% -- + -24% using_vec 271/s 184% 47% 36% 32% + --

    Not only do you avoid the three repetitions of which you complain, you also avoiding adding the overhead of three levels of function call to each piece of code being tested; and thus produce a far more accurate timing of the important code.

    It is simple, clear, concise and DRY; and works *now*, just as it always has.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      > Thus, the timing produced by the module incorporate three levels of subroutine call, only one of which has been partially negated. This demonstrates the naivety of the author(s).

      WOW really? Oh man, thank you for telling me!

      Well, unfortunately I've never done this.

      > If you write the benchmark in 1064279 like this:

      Which is Kenosis code, not the code I wrote.

      (playing tricks again you funny little bastard? Ha ha ha ... yawn)

      The approach of hashes with

      { name1 => sub { }, name2 => sub { }, }

      is well known. but it's sufficiently different to easily introduce errors.

      Especially when copying existing code, things like comma-separation, indentation, positions of names, ... (see also Eily's comment).

      Secondly and more importantly the need to benchmark often comes if code has already been written and experimenting starts with cloned forms of

      sub do_something { # not fast enough }

      then called

      sub do_something_old { ... }

      or

      sub do_something_1 { ... }

      And I don't think I'm alone, two other monks already asked about filtering some subs out of a package and/or ignoring imported subs.

      Now if you don't like a convenience module to facilitate this kind of benchmarking, better don't use it.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        Which is Kenosis code, not the code I wrote. (playing tricks again you funny little bastard ... yawn)

        As clearly identified in the post to which I linked, (so no "trick"). Instead you write crap like this.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: RFC extending Benchmark.pm to facilitate CODEHASHREF
by Eily (Monsignor) on Nov 26, 2013 at 00:27 UTC

    To get a lighter syntax you can already do something like :

    my @arr_1 = map {rand 1e6} 8000; my @arr_2 = map {rand 1e6} 6000; my $cmp = { hash_values_diff => sub { my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; values %diff3 ; }, using_vec => sub { my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; grep !vec( $vec, $_, 1 ), @arr_1; }, hash_grep => sub { my %arr_2_hash; undef @arr_2_hash{@arr_2}; grep !exists $arr_2_hash{$_}, @arr_1; } }; cmpthese(-3, $cmp);
    I do like the package approach more though, because there's no reference to a hash of refences to subs, which probably is quite unsettling for beginners. And even if they don't understand how it works, it's surely better if they at least understand the syntax to use it, instead of just copying an exemple with even weirder concepts than those they already have trouble understanding.

    If you go for the package solution, you might as well have it use parent YourBenchmark, so that you can call CMP->benchmark(-5);. And you may want to add filtering of some sort, because one might want to call a function imported from another package inside of the functions that should be benchmarked. I'm sure someone could have come up with a function using something out of List::AllUtils for the exemple you gave. Edit: and you already said that with using the attributes as the tool for selecting the benchmarked subs, my bad.

      Hi

      Thanks for the input, pointing me to new use cases ! :)

      I'm pretty sure it's possible to exclude imported subs.

      And passing optional filter-conditions (like regex) operating on sub-names are a good idea.

      Combined with optional attributes all cases should be covered.

      Will show a proof of concept soon. : )

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Re: RFC extending Benchmark.pm to facilitate CODEHASHREF
by tobyink (Canon) on Nov 26, 2013 at 08:54 UTC
    use Attribute::Benchmark; sub name1 :Benchmark { ... } sub name2 :Benchmark { ... }

    That is all.

    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
      So you liked the attribute idea? :)

      Attribute-Benchmark ===================  Created:      2013-11-26

      "Publish or perish"? ;)

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        To be honest, I can't remember reading that paragraph in your OP, though I may have skimmed it and absorbed it subconsciously.

        I was also thinking along the lines of Test::Class::MOP's is testcase trait and how mop's method traits are basically a variation on sub attributes.

        use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1064322]
Approved by Arunbear
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-03-19 14:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found