http://www.perlmonks.org?node_id=250100


in reply to Re: Making a match
in thread Making a match

Hashes certainly are the best solution, if you can amortize the set-up costs (and only slightly worse than grep even if you have to set up repeatedly -- i.e. if the list keeps changing).

But if ultimate efficiency is the goal, then there is an even more efficient way to use hash lookup:

my %hash; @hash{@list} = (); # Don't bother filling in values foreach my $x (1..1000) { if (exists $hash{$x}) # Just test for existence of key { print "$x is in the list\n"; } }

Replies are listed 'Best First'.
Re: Re: Re: Making a match
by DrManhattan (Chaplain) on Apr 14, 2003 at 10:40 UTC
    ++ to you and perlplexer. I hadn't realized you could do it that way. Benchmark.pm shows defining the hash is over twice as fast as assigning values to it. The benchmark results on the comparison techniques are more interesting:
    if ($hash{$x} == 1) { # always slower than if($hash{$x}) # or if(defined($hash{$x})) } if ($hash{$x}) { # slower than if(defined($hash{$x})) # when $hash{$x} is undefined # faster than if(defined($hash{$x})) # when $hash{$x} is defined }
    Here's the benchmark code I used:
    #!/usr/bin/perl use strict; use Benchmark; my @list = (10,11,12,13,14,15,17,18,22,23,24,25,26,27,28,29,30,31,32,3 +3,34,35 ); my %hash; @hash{@list} = (1) x @list; timethese(1000000, { 'Hash Defined Miss' => sub { if (defined($hash{1})) { #nop } }, 'Hash Defined Hit' => sub { if (defined($hash{10})) { #nop } }, 'Hash Assigned Miss Truth' => sub { if ($hash{1}) { #nop } }, 'Hash Assigned Miss Equality' => sub { if ($hash{1} == 1) { #nop } }, 'Hash Assigned Hit Truth' => sub { if ($hash{10}) { #nop } }, 'Hash Assigned Hit Equality' => sub { if ($hash{10} == 1) { #nop } } });
    Benchmark: timing 1000000 iterations of Hash Assigned Hit Equality, Ha +sh Assigned Hit Truth, Hash Assigned Miss Equality, Hash Assigned Miss Truth, Hash Define +d Hit, Hash Defined Miss... Hash Assigned Hit Equality: 5 wallclock secs ( 1.52 usr + 0.00 sys = + 1.52 CPU) @ 657894.74/s (n=1000000) Hash Assigned Hit Truth: 3 wallclock secs ( 1.29 usr + -0.01 sys = 1 +.28 CPU) @ 781250.00/s (n=1000000) Hash Assigned Miss Equality: 3 wallclock secs ( 1.30 usr + 0.00 sys += 1.30 CPU) @ 769230.77/s (n=1000000) Hash Assigned Miss Truth: 1 wallclock secs ( 1.15 usr + 0.01 sys = +1.16 CPU) @ 862068.97/s (n=1000000) Hash Defined Hit: 1 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU +) @ 709219.86/s (n=1000000) Hash Defined Miss: 2 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CP +U) @ 1052631.58/s (n=1000000)

    -Matt

      (Sigh. I just can help myself, can I? Always with the advice... ;-)

      Two suggestions for using Benchmark:

      1. If you're using long names for each test, pad the names to the same length with extra spaces on the left. Then your results are much easier to read and compare:
        Hash Exists Hit: 4 wallclock secs @ 1204819.28/s (n=10000 +00) Hash Defined Hit: 3 wallclock secs @ 2173913.04/s (n=10000 +00) Hash Exists Miss: 1 wallclock secs @ 1515151.52/s (n=10000 +00) Hash Defined Miss: 3 wallclock secs @ 1492537.31/s (n=10000 +00) Hash Assigned Miss Truth: 6 wallclock secs @ 1515151.52/s (n=10000 +00) Hash Assigned Hit Equality: 4 wallclock secs @ 1149425.29/s (n=10000 +00) Hash Assigned Hit Truth: 5 wallclock secs @ 970873.79/s (n=100000 +0) Hash Assigned Miss Equality: 6 wallclock secs @ 877192.98/s (n=100000 +0)
      2. Use cmpthese instead of timethese. That adds a handy comparison matrix at the end of your test results (though this suggestion works much better with shorter names):
        Rate HDH HAHE HAME HEH HAMT HDM HAHT HEM HDH 1111111/s -- -0% -4% -21% -24% -28% -30% -39% HAHE 1111111/s 0% -- -4% -21% -24% -28% -30% -39% HAME 1162791/s 5% 5% -- -17% -21% -24% -27% -36% HEH 1408451/s 27% 27% 21% -- -4% -8% -11% -23% HAMT 1470588/s 32% 32% 26% 4% -- -4% -7% -19% HDM 1538462/s 38% 38% 32% 9% 5% -- -3% -15% HAHT 1587302/s 43% 43% 37% 13% 8% 3% -- -13% HEM 1818182/s 64% 64% 56% 29% 24% 18% 15% --
        WRT suggestion 1: just thought I should mention that I was, of course, far too Lazy to actually align them myself:
        use List::Util 'max'; sub align { my ($hash) = @_; my $len = max map length, keys %$hash; $hash->{sprintf "%${len}s", $_} = delete $hash->{$_} for keys %$hash; return $hash; } cmpthese(10000, align { 'Bacon, Lettuce, & Tomato' => sub {...}, 'Ham on Rye' => sub {...}, 'Spam' => sub {...}, # etc. });