Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: given-when construct unexpected bahavior wit arrays

by mantager (Sexton)
on Jun 07, 2012 at 17:48 UTC ( #975005=note: print w/ replies, xml ) Need Help??


in reply to Re: given-when construct unexpected bahavior wit arrays
in thread given-when construct unexpected behavior with arrays

Thanks you all, monks.
I tested the regexp vs "eq" thing in grep (using Benchmark qw/timethis/) and "eq" is waaaaaaay faster than the regexp :D

As for the smart match thing, I found out that too, and it seems quite broken. What I find strange is that the various comments go back even to 2010, and I thought it should had been fixed by now (<-- feel free to fix the verbs in the previous sentence, I usually get lost in hypothetical sentences :P ).
I'll benchmark the last solution proposed by brx, just out of curiosity.

Thank you again!
Cheers.


Comment on Re^2: given-when construct unexpected bahavior wit arrays
Re^3: given-when construct unexpected bahavior wit arrays
by mantager (Sexton) on Jun 08, 2012 at 06:00 UTC

    Ok, this is the last testcase:

    #!/usr/bin/env perl # ex: set tabstop=4 noexpandtab: use v5.14; use warnings; use Benchmark qw/timethis/; my $count = shift || 10_000; sub grep_in_array { my ($element, @array) = @_; grep {$element eq $_} @array and return 1; return 0; } sub is_in_array { my ($element, @array) = @_; given ("_$element") { when ([map {"_$_"} @array]) { return 1; } } return 0; } my @array = (0..10_000, 'abcd'); for my $element (qw/a ab abc 0 1 10 100 1000 10000 10001 abcd/) { say "Test with: $element"; say "With grep:"; say sprintf("Element %s %s in array", $element, grep_in_array($elem +ent, @array) ? "is" : "is not"); say "With given-when:"; say sprintf("Element %s %s in array", $element, is_in_array($elem +ent, @array) ? "is" : "is not"); say "With grep:"; timethis($count, sub { grep_in_array($element, @array); }); say "With given-when:"; timethis($count, sub { is_in_array($element, @array); }); }

    And the winner is: grep

    Test with: a With grep: Element a is not in array With given-when: Element a is not in array With grep: timethis 1000: 2 wallclock secs ( 2.17 usr + 0.00 sys = 2.17 CPU) @ + 460.83/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.34 usr + 0.00 sys = 7.34 CPU) @ + 136.24/s (n=1000) Test with: ab With grep: Element ab is not in array With given-when: Element ab is not in array With grep: timethis 1000: 2 wallclock secs ( 2.20 usr + 0.00 sys = 2.20 CPU) @ + 454.55/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.66 usr + 0.00 sys = 7.66 CPU) @ + 130.55/s (n=1000) Test with: abc With grep: Element abc is not in array With given-when: Element abc is not in array With grep: timethis 1000: 2 wallclock secs ( 2.33 usr + 0.00 sys = 2.33 CPU) @ + 429.18/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.62 usr + 0.00 sys = 7.62 CPU) @ + 131.23/s (n=1000) Test with: 0 With grep: Element 0 is in array With given-when: Element 0 is in array With grep: timethis 1000: 2 wallclock secs ( 2.17 usr + 0.00 sys = 2.17 CPU) @ + 460.83/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.24 usr + 0.00 sys = 7.24 CPU) @ + 138.12/s (n=1000) Test with: 1 With grep: Element 1 is in array With given-when: Element 1 is in array With grep: timethis 1000: 2 wallclock secs ( 2.34 usr + 0.00 sys = 2.34 CPU) @ + 427.35/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.44 usr + 0.00 sys = 7.44 CPU) @ + 134.41/s (n=1000) Test with: 10 With grep: Element 10 is in array With given-when: Element 10 is in array With grep: timethis 1000: 2 wallclock secs ( 2.27 usr + 0.00 sys = 2.27 CPU) @ + 440.53/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.57 usr + 0.01 sys = 7.58 CPU) @ + 131.93/s (n=1000) Test with: 100 With grep: Element 100 is in array With given-when: Element 100 is in array With grep: timethis 1000: 2 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ + 469.48/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.97 usr + 0.02 sys = 7.99 CPU) @ + 125.16/s (n=1000) Test with: 1000 With grep: Element 1000 is in array With given-when: Element 1000 is in array With grep: timethis 1000: 2 wallclock secs ( 2.29 usr + 0.00 sys = 2.29 CPU) @ + 436.68/s (n=1000) With given-when: timethis 1000: 8 wallclock secs ( 7.69 usr + 0.01 sys = 7.70 CPU) @ + 129.87/s (n=1000) Test with: 10000 With grep: Element 10000 is in array With given-when: Element 10000 is in array With grep: timethis 1000: 2 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ + 487.80/s (n=1000) With given-when: timethis 1000: 10 wallclock secs ( 9.23 usr + 0.01 sys = 9.24 CPU) @ + 108.23/s (n=1000) Test with: 10001 With grep: Element 10001 is not in array With given-when: Element 10001 is not in array With grep: timethis 1000: 2 wallclock secs ( 2.29 usr + 0.00 sys = 2.29 CPU) @ + 436.68/s (n=1000) With given-when: timethis 1000: 9 wallclock secs ( 8.54 usr + 0.01 sys = 8.55 CPU) @ + 116.96/s (n=1000) Test with: abcd With grep: Element abcd is in array With given-when: Element abcd is in array With grep: timethis 1000: 2 wallclock secs ( 2.28 usr + 0.00 sys = 2.28 CPU) @ + 438.60/s (n=1000) With given-when: timethis 1000: 10 wallclock secs ( 9.53 usr + 0.03 sys = 9.56 CPU) @ + 104.60/s (n=1000)

    Bye.

      Yep, doing map {"_$_"} @array before each test, in given-when, is not good.

      But grep works too much because if the first array element matches, you don't need to look others.

      This is probably the fastest solution :

      sub is_in_array2 { my $element = shift; ($element eq $_) && return 1 for @_; }

      update: You should also try with an hash %is_in_array

      Oh, don't be silly. Last testcase. Ha! :-)

      You're missing a whole slew of tests. And, to make matters worse, you're using timethis instead of cmpthese, which makes it so much harder to compare. So, first I'm going to provide, not necessarily the last test case, but the most recent (at the time of this writing) :-) And then I will comment on it, and then provide the output.

      Why 5.16? Because it's the latest. I don't think anything there needs anything more recent than 5.10. So a few things. First off, I'm combining all of the checks into a single run. This basically means we're taking average timings instead of best/worst case. I've also embedded testing the return values here just to make sure we don't end up with a function that is super-fast but wrong. I also capture warnings - since we're trying to do this without provoking warnings, again, testing it helps. This will all slow down each test, but all tests should get the same constant slowdown so the rankings should be the same even if the numbers aren't quite right.

      I've moved the array to a global so we also aren't impacted by copying the array around. Again, it's constant, so it's just noise. But this is probably bigger noise than the above :-)

      So, some additional tests. grep_in_array is your old one. brx_match is brx's suggestion (pretty good one). sm1 is your is_in_array while sm2 is a slight improvement on it (get rid of the underscore, it doesn't help). sh1/sh2 are the same as sm1/sm2 except using a hash base, with sh1 assuming that we're doing many matches against the array (thus the overhead of creating the hash can be ignored) and sh2 assuming we're doing one/few matches (thus the overhead of creating the hash is important). And any is just using List::MoreUtils' XS-based function. It's basically the same as brx' suggestion, but implemented in XS (aka C) instead. Oh, and it's already implemented and isn't subject to cut&paste errors or anything of the like.

      Okay, now for the output:

      Rate sh2 sm1 sm2 grep brx any sh1 sh2 18.3/s -- -30% -33% -87% -92% -93% -100% sm1 26.3/s 44% -- -4% -81% -88% -90% -100% sm2 27.4/s 49% 4% -- -80% -88% -89% -100% grep 138/s 653% 424% 404% -- -40% -47% -100% brx 228/s 1145% 768% 735% 65% -- -12% -100% any 258/s 1309% 882% 844% 87% 13% -- -100% sh1 52512/s 286329% 199445% 191840% 37948% 22899% 20227% --
      It should be of no surprise that sh1 completely blew the rest out of the water. The interesting bits are the 4% boost I got from eliminating the unneeded underscore from your original attempt, how much overhead creating the hash has (how slow sh2 is), how much better brx's suggestion is over plain grep (though that's not entirely average - on successful finds, we're weighted toward the first half of the array over the second half), and how much benefit there is (13%) to the XS version of brx's suggestion in List::MoreUtils::any.

      Moral of the story: use hashes for lookups if you're doing repeated lookups. And if you're not, use CPAN modules.

      Oh, and maybe I'll try opening the bug report for perl to fix this warning. It's still bugging me. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975005]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2014-09-20 18:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (160 votes), past polls