Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Dreams of Probability

by jynx (Priest)
on Dec 17, 2002 at 21:21 UTC ( #220666=note: print w/ replies, xml ) Need Help??


in reply to Dreams of Probability


Some more metrics,

Atcroft started to touch on this, but didn't actually extrapolate. The list of words can be shortened in more than one direction. The first is word length. No one in their right mind is going to go through the trouble of saying pneumonultramicroscopicsilicovolcanoconiosis or supercalifragilisticexpialidocious (i may have spelled one or both of those wrong) everytime they want to pass one door. Also words that are too short might be disallowed (like 'a', 'oh', 'it', etc...) so you can probably skip those as well.

The second is the word's esoteric factor. Although this is largely gauged on whether or not they used a dictionary to pick the word or whether they just used a word from memory. That is to say, if they opened the dictionary to a random point and chose that word, then you cannot shorten the word list due to esoteric words being less inclined to be lodged in the average human's brain. But if it's chosen from memory...

Personally i'm an esoteric sesquipedalian alien, and so using obscure or arcane wording would be fine, but most people haven't spent a lifetime trying to learn such strange words, so the word list could be shortened to those words within an epsilon of, say, the fourth grade reading level (the average reading level of most americans). If you put stock that these people are a bit more intelligent, up that list to 7th grade reading level. That's roughly the highest reading level in publication (IIRC). That's still far fewer words than the average dictionary.

On the other hand, there is also the possibility that the word is not in the dictionary. Personally caddywhompus would be a fantastic word to use, especially since one will never find it in a dictionary (or at least, not one that i know of). According to your post this is not the case, so i won't attempt any thoughts toward this end :-)

So other than those dictionary reductions, you'll still have to go through about half the dictionary that you end up with? Hmm, maybe not. First of all, because we always count numbers starting with 1, and because nice, round numbers appeal to the human intillect so much, if you look through a book and count the instance of each 0-9 digit in it, you'll come up with statistically more 1's than most other digits (i forget the actual spread; but 1 is very common).

This is untested, but if the same works for the alphabet, than you'll have a greater chance of finding your word near the beginning of the alphabet. Or more accurately, you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters. The most common letter is e, but it doesn't start as many words as s or t, so you could start there. Like i said, it's not proven, but i'm hypothesizing that it would reduce the amount of time used to find the word.

Just my $0.02,
jynx


Comment on Re: Dreams of Probability
Re: Re: Dreams of Probability
by toma (Vicar) on Dec 18, 2002 at 06:28 UTC
    Great idea! Your posting, as do many at the monestary, exceeded seventh grade reading level quite handily:
    Fog15.2difficult
    Grade Level13(Flesch-Kincaid)
    Flesch50.6
    Complex words10.5%
    I measured this with my Style and Spelling Checker in Perl. (I'll spare you the results of the spell check :-).

    If I had to attack the lock with perl, I would read the words from a book, using a hash to skip all the words I had read before.

    I can imagine it now: "Call me Ishmael." - click.

    It should work perfectly the first time! - toma

Re: Re: Dreams of Probability
by abell (Chaplain) on Dec 18, 2002 at 08:46 UTC
    Or more accurately, you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters. The most common letter is e, but it doesn't start as many words as s or t, so you could start there. Like i said, it's not proven, but i'm hypothesizing that it would reduce the amount of time used to find the word.

    I doubt it. If you consider the words to be equiprobable, then the probability for the code to lie in some subset of the dictionary is directly proportional to the size of the subset. So, the probability and the average time increase accordingly and the strategy is of no help.

    If the words were not equiprobable, instead, then the most advisable course of action would be choosing them ordered by probability (divided by a cost function if one wants to take length or other parameters into account).

    Cheers

    Antonio

    The blackness of the night echoes my soul - A. Tucket
Re^2: Dreams of Probability
by Aristotle (Chancellor) on Dec 18, 2002 at 12:32 UTC
    To expand on abell's excellent point in simpler words:
    you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters.

    That won't reduce the length of your search, because the fact that it is more probable that a word starts with a certain letter means there are more words starting with that letter. So you are more likely to find the password in that subset, but to exhaust that subset you will have to make more guesses, so that you can make no fewer guesses regardless which way you choose to order them.

    In effect, any approach of changing the order of guesses can be mapped to linear scanning of entries in an accordingly shuffled list of words. Once you realize that it is easy to see that there is absolutely no way to guess the right word in less than (dictionary size / 2) attempts on average.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://220666]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-07-11 02:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (217 votes), past polls