|Perl: the Markov chain saw|
Re: Dreams of Probabilityby jynx (Priest)
|on Dec 17, 2002 at 21:21 UTC||Need Help??|
Some more metrics,
Atcroft started to touch on this, but didn't actually extrapolate. The list of words can be shortened in more than one direction. The first is word length. No one in their right mind is going to go through the trouble of saying pneumonultramicroscopicsilicovolcanoconiosis or supercalifragilisticexpialidocious (i may have spelled one or both of those wrong) everytime they want to pass one door. Also words that are too short might be disallowed (like 'a', 'oh', 'it', etc...) so you can probably skip those as well.
The second is the word's esoteric factor. Although this is largely gauged on whether or not they used a dictionary to pick the word or whether they just used a word from memory. That is to say, if they opened the dictionary to a random point and chose that word, then you cannot shorten the word list due to esoteric words being less inclined to be lodged in the average human's brain. But if it's chosen from memory...
Personally i'm an esoteric sesquipedalian alien, and so using obscure or arcane wording would be fine, but most people haven't spent a lifetime trying to learn such strange words, so the word list could be shortened to those words within an epsilon of, say, the fourth grade reading level (the average reading level of most americans). If you put stock that these people are a bit more intelligent, up that list to 7th grade reading level. That's roughly the highest reading level in publication (IIRC). That's still far fewer words than the average dictionary.
On the other hand, there is also the possibility that the word is not in the dictionary. Personally caddywhompus would be a fantastic word to use, especially since one will never find it in a dictionary (or at least, not one that i know of). According to your post this is not the case, so i won't attempt any thoughts toward this end :-)
So other than those dictionary reductions, you'll still have to go through about half the dictionary that you end up with? Hmm, maybe not. First of all, because we always count numbers starting with 1, and because nice, round numbers appeal to the human intillect so much, if you look through a book and count the instance of each 0-9 digit in it, you'll come up with statistically more 1's than most other digits (i forget the actual spread; but 1 is very common).
This is untested, but if the same works for the alphabet, than you'll have a greater chance of finding your word near the beginning of the alphabet. Or more accurately, you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters. The most common letter is e, but it doesn't start as many words as s or t, so you could start there. Like i said, it's not proven, but i'm hypothesizing that it would reduce the amount of time used to find the word.
Just my $0.02,