in reply to Check word presence WITHOUT hashes or grep

Kudos for mentioning up front that this is a learning exercise. That makes the somewhat odd requirements less of a distraction. It also means that my suggestion will apply to your current approach rather than trying to change it.

Since you've got the dictionary words alphabetized in an array, you could use a binary search to determine if a word in your corpus is in the array. You should be able to find some things around here on that pretty easily if you use Super Search and sprinkle with key words liberally.

Good luck

  • Comment on Re: Check word presence WITHOUT hashes or grep

Replies are listed 'Best First'.
Re^2: Check word presence WITHOUT hashes or grep
by Your Mother (Archbishop) on Apr 30, 2008 at 05:31 UTC

    ++ IIRC from a math class back in the Cambrian, binary search is unbeatable on sorted data. Given a 500,000 word dictionary you will never have to do more than, let's see...

    my $lookups = 0; my $words = 500_000; while ( $words > 1 ) { $words /= 2; $lookups++; } print $lookups, "\n";

    ...19 lookups -- meaning comparisons with some combination of lt, gt, ge, le, eq.

      Thank you all for your help. You aren't perl monks, but perl gods !

      Just a question to my mother, what does the
      $words /= 2; part mean ? Divide $words into 2, to get 250 000 ?

        :) That's right. /= is the operator for "divide yourself" or "divide in place." There's probably a better or more technical term. There are plenty more where that came from. See perldoc perlop for more fun.

Re^2: Check word presence WITHOUT hashes or grep
by gojippo (Novice) on Apr 30, 2008 at 06:13 UTC
    Thank you for your help, bobf. I looked up for binary search on the Super Search, but didn't find anything easy enough for me to understand.
    How could I use binary search to determine if a word in my corpus is there ?
      Binary search, and algorithms in general, is a field you must try to get a grasp on no matter what programming languages you work with. Often in a high level programming language you don't need to implement them yourself, but understanding these issues will have profound impact on how well you can learn and exploit a particular programming language. The more algorithms and datastructures you are familiar with, the easier it will be to see good simple solutions to what seems a hard task without such knowledge.

      Binary search is the process where you halve the remaining search space for each test. So, lets say you have your dictionary of say 100.000 words in a SORTED array. Now you want to test if the word Trzagrat is in there somewhere. First you look on the position in the middle of your dictionary array. Either you find the word there, or if not, you will know which half of the dictionary must hold the word if it's there at all, because the dictionary is sorted, and the word you're looking up either sorts before or after the word you found at this position. So you contiune...

      A better explanation on Wikipedia, binary search

        Thank you for your explanation, stiller. I updated just when you posted your message. I managed to understand the logic of binary search, but I don't understand the how-to-do-it part. Do you have an easy-to-understand example ? It would really help.
      Ok, sorry for the question, I just found a nice thread.