XP is just a number PerlMonks

### Re: Making a match

by DrManhattan (Chaplain)
 on Apr 12, 2003 at 13:30 UTC ( #250052=note: print w/replies, xml ) Need Help??

in reply to Making a match

If you're checking \$x aginst your list multiple times, it might be more efficient to use a hash:
```my @list = (10,11,12,13,14,15,17,18,22,23,24,25,26,27,28,29,30,31,32,3
+3,34,35 );
my %hash;
@hash{@list} = (1) x @list;

# This solution only makes sense if you're
# making the comparison multiple times
foreach my \$x (1..1000)
{
if (\$hash{\$x})
{
print "\$x is in the list\n";
}
}
You spend some initial setup time creating the hash, but then all your comparisons are O(1).

-Matt

Replies are listed 'Best First'.
Re: Re: Making a match
by TheDamian (Priest) on Apr 13, 2003 at 00:40 UTC
Hashes certainly are the best solution, if you can amortize the set-up costs (and only slightly worse than grep even if you have to set up repeatedly -- i.e. if the list keeps changing).

But if ultimate efficiency is the goal, then there is an even more efficient way to use hash lookup:

```my %hash;
@hash{@list} = ();          # Don't bother filling in values

foreach my \$x (1..1000)
{
if (exists \$hash{\$x})   # Just test for existence of key
{
print "\$x is in the list\n";
}
}
++ to you and perlplexer. I hadn't realized you could do it that way. Benchmark.pm shows defining the hash is over twice as fast as assigning values to it. The benchmark results on the comparison techniques are more interesting:
```if (\$hash{\$x} == 1)
{
# always slower than if(\$hash{\$x})
# or if(defined(\$hash{\$x}))
}

if (\$hash{\$x})
{
# slower than if(defined(\$hash{\$x}))
# when \$hash{\$x} is undefined

# faster than if(defined(\$hash{\$x}))
# when \$hash{\$x} is defined
}
Here's the benchmark code I used:
```#!/usr/bin/perl

use strict;
use Benchmark;

my @list = (10,11,12,13,14,15,17,18,22,23,24,25,26,27,28,29,30,31,32,3
+3,34,35 );
my %hash;
@hash{@list} = (1) x @list;

timethese(1000000, {
'Hash Defined Miss' => sub {
if (defined(\$hash{1})) {
#nop
}
},
'Hash Defined Hit' => sub {
if (defined(\$hash{10})) {
#nop
}
},
'Hash Assigned Miss Truth' => sub {
if (\$hash{1}) {
#nop
}
},
'Hash Assigned Miss Equality' => sub {
if (\$hash{1} == 1) {
#nop
}
},
'Hash Assigned Hit Truth' => sub {
if (\$hash{10}) {
#nop
}
},
'Hash Assigned Hit Equality' => sub {
if (\$hash{10} == 1) {
#nop
}
}
});
```Benchmark: timing 1000000 iterations of Hash Assigned Hit Equality, Ha
+sh Assigned Hit Truth,
Hash Assigned Miss Equality, Hash Assigned Miss Truth, Hash Define
+d Hit, Hash Defined Miss...
Hash Assigned Hit Equality:  5 wallclock secs ( 1.52 usr +  0.00 sys =
+  1.52 CPU) @ 657894.74/s (n=1000000)
Hash Assigned Hit Truth:  3 wallclock secs ( 1.29 usr + -0.01 sys =  1
+.28 CPU) @ 781250.00/s (n=1000000)
Hash Assigned Miss Equality:  3 wallclock secs ( 1.30 usr +  0.00 sys
+=  1.30 CPU) @ 769230.77/s (n=1000000)
Hash Assigned Miss Truth:  1 wallclock secs ( 1.15 usr +  0.01 sys =
+1.16 CPU) @ 862068.97/s (n=1000000)
Hash Defined Hit:  1 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU
+) @ 709219.86/s (n=1000000)
Hash Defined Miss:  2 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CP
+U) @ 1052631.58/s (n=1000000)

-Matt

(Sigh. I just can help myself, can I? Always with the advice... ;-)

Two suggestions for using Benchmark:

1. If you're using long names for each test, pad the names to the same length with extra spaces on the left. Then your results are much easier to read and compare:
```            Hash Exists Hit:  4 wallclock secs @ 1204819.28/s (n=10000
+00)
Hash Defined Hit:  3 wallclock secs @ 2173913.04/s (n=10000
+00)
Hash Exists Miss:  1 wallclock secs @ 1515151.52/s (n=10000
+00)
Hash Defined Miss:  3 wallclock secs @ 1492537.31/s (n=10000
+00)
Hash Assigned Miss Truth:  6 wallclock secs @ 1515151.52/s (n=10000
+00)
Hash Assigned Hit Equality:  4 wallclock secs @ 1149425.29/s (n=10000
+00)
Hash Assigned Hit Truth:  5 wallclock secs @ 970873.79/s (n=100000
+0)
Hash Assigned Miss Equality:  6 wallclock secs @ 877192.98/s (n=100000
+0)
2. Use cmpthese instead of timethese. That adds a handy comparison matrix at the end of your test results (though this suggestion works much better with shorter names):
```          Rate  HDH HAHE HAME  HEH HAMT  HDM HAHT  HEM
HDH  1111111/s   --  -0%  -4% -21% -24% -28% -30% -39%
HAHE 1111111/s   0%   --  -4% -21% -24% -28% -30% -39%
HAME 1162791/s   5%   5%   -- -17% -21% -24% -27% -36%
HEH  1408451/s  27%  27%  21%   --  -4%  -8% -11% -23%
HAMT 1470588/s  32%  32%  26%   4%   --  -4%  -7% -19%
HDM  1538462/s  38%  38%  32%   9%   5%   --  -3% -15%
HAHT 1587302/s  43%  43%  37%  13%   8%   3%   -- -13%
HEM  1818182/s  64%  64%  56%  29%  24%  18%  15%   --

Create A New User
Node Status?
node history
Node Type: note [id://250052]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2021-01-20 23:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
The STEM quote I most wish I'd made is:

Results (226 votes). Check out past polls.

Notices?