XP is just a number PerlMonks

### Re^2: given-when construct unexpected bahavior wit arrays

by mantager (Sexton)
 on Jun 07, 2012 at 17:48 UTC ( #975005=note: print w/replies, xml ) Need Help??

Thanks you all, monks.
I tested the regexp vs "eq" thing in grep (using Benchmark qw/timethis/) and "eq" is waaaaaaay faster than the regexp :D

As for the smart match thing, I found out that too, and it seems quite broken. What I find strange is that the various comments go back even to 2010, and I thought it should had been fixed by now (<-- feel free to fix the verbs in the previous sentence, I usually get lost in hypothetical sentences :P ).
I'll benchmark the last solution proposed by brx, just out of curiosity.

Thank you again!
Cheers.

• Comment on Re^2: given-when construct unexpected bahavior wit arrays

Replies are listed 'Best First'.
Re^3: given-when construct unexpected bahavior wit arrays
by mantager (Sexton) on Jun 08, 2012 at 06:00 UTC

Ok, this is the last testcase:

```#!/usr/bin/env perl
# ex: set tabstop=4 noexpandtab:

use v5.14;
use warnings;
use Benchmark qw/timethis/;

my \$count = shift || 10_000;

sub grep_in_array {
my (\$element, @array) = @_;
grep {\$element eq \$_} @array and return 1;
return 0;
}

sub is_in_array {
my (\$element, @array) = @_;
given ("_\$element") {
when ([map {"_\$_"} @array]) { return 1; }
}
return 0;
}

my @array = (0..10_000, 'abcd');

for my \$element (qw/a ab abc 0 1 10 100 1000 10000 10001 abcd/) {

say "Test with: \$element";

say "With grep:";
say sprintf("Element %s %s in array", \$element, grep_in_array(\$elem
+ent, @array) ? "is" : "is not");

say "With given-when:";
say sprintf("Element %s %s in array", \$element,   is_in_array(\$elem
+ent, @array) ? "is" : "is not");

say "With grep:";
timethis(\$count, sub { grep_in_array(\$element, @array); });

say "With given-when:";
timethis(\$count, sub {   is_in_array(\$element, @array); });

}

And the winner is: grep

```Test with: a
With grep:
Element a is not in array
With given-when:
Element a is not in array
With grep:
timethis 1000:  2 wallclock secs ( 2.17 usr +  0.00 sys =  2.17 CPU) @
+ 460.83/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.34 usr +  0.00 sys =  7.34 CPU) @
+ 136.24/s (n=1000)
Test with: ab
With grep:
Element ab is not in array
With given-when:
Element ab is not in array
With grep:
timethis 1000:  2 wallclock secs ( 2.20 usr +  0.00 sys =  2.20 CPU) @
+ 454.55/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.66 usr +  0.00 sys =  7.66 CPU) @
+ 130.55/s (n=1000)
Test with: abc
With grep:
Element abc is not in array
With given-when:
Element abc is not in array
With grep:
timethis 1000:  2 wallclock secs ( 2.33 usr +  0.00 sys =  2.33 CPU) @
+ 429.18/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.62 usr +  0.00 sys =  7.62 CPU) @
+ 131.23/s (n=1000)
Test with: 0
With grep:
Element 0 is in array
With given-when:
Element 0 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.17 usr +  0.00 sys =  2.17 CPU) @
+ 460.83/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.24 usr +  0.00 sys =  7.24 CPU) @
+ 138.12/s (n=1000)
Test with: 1
With grep:
Element 1 is in array
With given-when:
Element 1 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.34 usr +  0.00 sys =  2.34 CPU) @
+ 427.35/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.44 usr +  0.00 sys =  7.44 CPU) @
+ 134.41/s (n=1000)
Test with: 10
With grep:
Element 10 is in array
With given-when:
Element 10 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.27 usr +  0.00 sys =  2.27 CPU) @
+ 440.53/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.57 usr +  0.01 sys =  7.58 CPU) @
+ 131.93/s (n=1000)
Test with: 100
With grep:
Element 100 is in array
With given-when:
Element 100 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.13 usr +  0.00 sys =  2.13 CPU) @
+ 469.48/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.97 usr +  0.02 sys =  7.99 CPU) @
+ 125.16/s (n=1000)
Test with: 1000
With grep:
Element 1000 is in array
With given-when:
Element 1000 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.29 usr +  0.00 sys =  2.29 CPU) @
+ 436.68/s (n=1000)
With given-when:
timethis 1000:  8 wallclock secs ( 7.69 usr +  0.01 sys =  7.70 CPU) @
+ 129.87/s (n=1000)
Test with: 10000
With grep:
Element 10000 is in array
With given-when:
Element 10000 is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.05 usr +  0.00 sys =  2.05 CPU) @
+ 487.80/s (n=1000)
With given-when:
timethis 1000: 10 wallclock secs ( 9.23 usr +  0.01 sys =  9.24 CPU) @
+ 108.23/s (n=1000)
Test with: 10001
With grep:
Element 10001 is not in array
With given-when:
Element 10001 is not in array
With grep:
timethis 1000:  2 wallclock secs ( 2.29 usr +  0.00 sys =  2.29 CPU) @
+ 436.68/s (n=1000)
With given-when:
timethis 1000:  9 wallclock secs ( 8.54 usr +  0.01 sys =  8.55 CPU) @
+ 116.96/s (n=1000)
Test with: abcd
With grep:
Element abcd is in array
With given-when:
Element abcd is in array
With grep:
timethis 1000:  2 wallclock secs ( 2.28 usr +  0.00 sys =  2.28 CPU) @
+ 438.60/s (n=1000)
With given-when:
timethis 1000: 10 wallclock secs ( 9.53 usr +  0.03 sys =  9.56 CPU) @
+ 104.60/s (n=1000)

Bye.

Yep, doing map {"_\$_"} @array before each test, in given-when, is not good.

But grep works too much because if the first array element matches, you don't need to look others.

This is probably the fastest solution :

```sub is_in_array2 {
my \$element = shift;
(\$element eq \$_) && return 1 for @_;
}

update: You should also try with an hash %is_in_array

Oh, don't be silly. Last testcase. Ha! :-)

You're missing a whole slew of tests. And, to make matters worse, you're using timethis instead of cmpthese, which makes it so much harder to compare. So, first I'm going to provide, not necessarily the last test case, but the most recent (at the time of this writing) :-) And then I will comment on it, and then provide the output.

Why 5.16? Because it's the latest. I don't think anything there needs anything more recent than 5.10. So a few things. First off, I'm combining all of the checks into a single run. This basically means we're taking average timings instead of best/worst case. I've also embedded testing the return values here just to make sure we don't end up with a function that is super-fast but wrong. I also capture warnings - since we're trying to do this without provoking warnings, again, testing it helps. This will all slow down each test, but all tests should get the same constant slowdown so the rankings should be the same even if the numbers aren't quite right.

I've moved the array to a global so we also aren't impacted by copying the array around. Again, it's constant, so it's just noise. But this is probably bigger noise than the above :-)

So, some additional tests. grep_in_array is your old one. brx_match is brx's suggestion (pretty good one). sm1 is your is_in_array while sm2 is a slight improvement on it (get rid of the underscore, it doesn't help). sh1/sh2 are the same as sm1/sm2 except using a hash base, with sh1 assuming that we're doing many matches against the array (thus the overhead of creating the hash can be ignored) and sh2 assuming we're doing one/few matches (thus the overhead of creating the hash is important). And any is just using List::MoreUtils' XS-based function. It's basically the same as brx' suggestion, but implemented in XS (aka C) instead. Oh, and it's already implemented and isn't subject to cut&paste errors or anything of the like.

Okay, now for the output:

```        Rate     sh2     sm1     sm2    grep     brx     any     sh1
sh2   18.3/s      --    -30%    -33%    -87%    -92%    -93%   -100%
sm1   26.3/s     44%      --     -4%    -81%    -88%    -90%   -100%
sm2   27.4/s     49%      4%      --    -80%    -88%    -89%   -100%
grep   138/s    653%    424%    404%      --    -40%    -47%   -100%
brx    228/s   1145%    768%    735%     65%      --    -12%   -100%
any    258/s   1309%    882%    844%     87%     13%      --   -100%
sh1  52512/s 286329% 199445% 191840%  37948%  22899%  20227%      --
It should be of no surprise that sh1 completely blew the rest out of the water. The interesting bits are the 4% boost I got from eliminating the unneeded underscore from your original attempt, how much overhead creating the hash has (how slow sh2 is), how much better brx's suggestion is over plain grep (though that's not entirely average - on successful finds, we're weighted toward the first half of the array over the second half), and how much benefit there is (13%) to the XS version of brx's suggestion in List::MoreUtils::any.

Moral of the story: use hashes for lookups if you're doing repeated lookups. And if you're not, use CPAN modules.

Oh, and maybe I'll try opening the bug report for perl to fix this warning. It's still bugging me. :-)

Create A New User
Node Status?
node history
Node Type: note [id://975005]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2018-04-22 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (83 votes). Check out past polls.

Notices?