Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Hash versus chain of elsifs

by mldvx4 (Monk)
on Nov 22, 2021 at 07:32 UTC ( #11139003=perlquestion: print w/replies, xml ) Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I have a question about what is more efficient for a small lookup function with only a few thousand items. Specifically I have a function in a module and that module has a function which gets passed a string. The function needs to determine if that string is in a set of a few thousand strings or not. How is that done most efficiently, and does the "best" approach vary by scale?

One way might be to load a hash with the strings as keys, and then see if defined($hash{$string}) or if defined($hash{$string}) has a value set.

Another would be to have a chain of elsif statements, and have the function cascade through those until either the end is reached or a match has been found.

Which approach is more appropriate or efficient? Or are the better approaches which can be used?

Replies are listed 'Best First'.
Re: Hash versus chain of elsifs
by dsheroh (Monsignor) on Nov 22, 2021 at 08:43 UTC
    Between the two approaches you mentioned, I would be extremely surprised if the hash did not turn out to be more efficient, as it doesn't have to step through a list of thousands of elsifs. It also has the advantage of taking constant time regardless of whether the string is found or not, rather than being very quick if the string is at the start of the elsif chain and being slower for a non-matching string, which would have to individually check every item in the list before it can be known that the string isn't there.

    More importantly, though, the hash approach is much more readable than an endless list of elsifs, and it also opens up the possibility of storing your list of items to match against in a config file or a database (since a hash's contents are just data, not code) which will make for easier long-term maintenance.

    BTW, the syntax you probably want for checking whether a hash key exists is exists($hash{$string}), not defined (which will return false if the key is there but has no value).

      Thanks. I'll go with the hash method then and use exists to check it. Does the hash get built each time the function is called or does it persist in some way? I'm guessing the former, from some tests I've tried.

        Depends on how and where you're building it; as was mentioned give us a sample and someone can comment on specifics. That being said though here's a way to (for example) initialize your hash from a file with one item per list lazily and only one time (unless you explicitly clear it):

        if( exists _get_cache()->{ $candidate } ) { say qq{IT DOES}; } else { say qq{No such luck . . .}; } { ## Block to scope our cache to just these subs my $lookup_cache = undef; sub _reset_cache { $lookup_cache = undef; } sub _get_cache { $lookup_cache //= _load_cache(); } sub _load_cache { ## presuming you've declared file var somewhere above . . . open( my $fh, q{<}, $CACHE_FILE_NAME ) or die qq{Can't open '$CACHE_FILE_NAME': $!\n}; $lookup_cache = {}; while( <$fh> ) { chomp; $lookup_cache->{ $_ } = 1; } close( $fh ); return $lookup_cache; } } ## End of limited scope block.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        You should make it persist. That will have a big impact on performance. If you need further help, show us some sample code and we'll be able to show various ways to make it persist.

Re: Hash versus chain of elsifs
by eyepopslikeamosquito (Bishop) on Nov 22, 2021 at 09:32 UTC

      The squirrel PDF of best practices you point to is quite interesting. Some I already practice, some are new, others I need more clarification on. Is there a page with more detail on some of the enumerated items, maybe with simple examples for some of them?

        You're meant to buy the book Perl Best Practices from O'Reilly ... though you could try clicking on "Start your free trial". It's considered unethical to search the internet for free pirated PDFs. :)

        TheDamian is a brilliant writer and it's a fantastic book IMHO - where it was described by Yanick as "worth it's weight in depleted uranium bars, and twice as entertaining".

Re: Hash versus chain of elsifs
by kikuchiyo (Friar) on Nov 22, 2021 at 09:28 UTC

    Is the set of strings fixed and known in advance?

    If yes, look up minimal perfect hashing. One has to be generated specifically for your set of strings, and the generation part is usually expensive, but once it's done it can tell whether an input string is in the set with just one hash lookup and one string comparison.

    Compilers and similar programs use this technique when they want to match tokens from the input stream against a small, known set of keywords.

    I don't know if it's worth the effort, though, if you're working in Perl - just use a regular hash and be done with it. You may want to look at the problem again when you're optimizing your program, but you'll likely find that hash access is not your main bottleneck.

Re: Hash versus chain of elsifs
by LanX (Sage) on Nov 22, 2021 at 10:32 UTC
    The only thing in Perl which might be able to rival a hash look-up is a or-regex using the Tree° Trie optimization, especially if you have repeated patterns.

    $in =~ m/^($STR1|$STR2|$STR3|...)$/

    You should provide more details, for a definitive answer.

    But a trie will always beat if-else with eq comparisons.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    updates

    °) that's not the name... how was it called again... ah ... s/Tree/Trie/

    > and does the "best" approach vary by scale?

    yes. see ${^RE_TRIE_MAXBUF}

      On the regex suggestion, once upon a time, I did a project which used Regexp::Assemble to take a long list of rexeges and combine them all into a single, monstrously unreadable, regex that would tell me in one shot whether any matched. I seem to recall hearing at some point that similar functionality had been added to the Perl core, but don't recall details.

      In any case, if you're looking for fixed strings rather than regex patterns, substring matches, etc., I would still expect hashes to be faster, but something based on Regexp::Assemble or a similar technique could be close enough to warrant a benchmark. And you might still want to go that route even if it's a little slower in order to gain the extra flexibility in how you match.

Re: Hash versus chain of elsifs
by perlfan (Vicar) on Nov 25, 2021 at 14:42 UTC
    In terms of complexity, a hash key look up is always the most efficient being O(1). The drawback, as has been mentioned, is that this requires a key. If you're dealing with a known set of strings, this is as efficient as possible. So to be clear, if you're going to do a cascade of if ( $str eq 'somestring' )... then use a hash look up.

    If dealing with a set of conditions (e.g., a range of values), then you need to use a conditional somewhere. The question becomes, how to compute a static hash key from a value (be it a range, regex, mathematical function, etc).

    E.g.,

    use strict; use warnings; my $value_buckets = { 'first half' => sub { print qq{first half!\n} }, 'second half' => sub { print qq{second half!\n} }, 'third half' => sub { print qq{third half!\n} }, 'not found' => sub { print qq{not found!\n} }, }; my $key = compute_key(42); $value_buckets->{$key}->(); # just an example, could be anything to derive a hash # key from $value sub compute_key { my $value = shift; if ( $value > 0 and $value < 33 ) { return 'first half'; } elsif ( $value >= 33 and $value <= 66 ) { return 'second half'; } elsif ( $value >= 67 and $value <= 100 ) { return 'third half'; } # last resort return 'not found'; }
    Output:
    # perl test.pl second half!

    A related question is, can you write a sophisticated perl program without conditionals that also minimizes the computational complexity? The answer is, "YES"; and if that's your interest I think your exploration may be added by looking at the functional side of Perl. And there's no better place to look than Higher Order Perl for that.

    Final note, even though an series of ifels.. statements will terminate when the condition is found; there is nothing to guarantee how far down the litany of conditionals you'll fall. So in terms of computational complexity, it's always going to add a constant factor equivalent to the worst case scenario of having to check all conditions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11139003]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2021-12-04 18:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?