We all know the right way to select a password, right?
Mix upper and lower case, pick random letters, mix in some
numbers and punctuation, and try to remember the final
result for longer than 30 seconds. If you assign passwords
like this to users, it's a matter of minutes before the
religious radio station emails you with a request that his/her
password be changed to "Jesus".
So I tried to grok the problem and come up with a solution.
Why not use random passwords that are actually based on real language?
I mean, people are used to being able to pronounce things, more
or less, and certain patterns are more comfortable to remember.
It's like phone numbers. String 8562621937 together, and it's
tough to remember. Format it as 856-262-1937 and most people
can recall it without too much trouble. (The area code is South
Jersey, but I don't know whose number that is...I wouldn't
suggest you dial it ;)
The algorithm I proposed to some security-minded friends
is as follows:
- Parse a dictionary file (say, /usr/share/dict/words)
into consonant-vowel patterns, and weigh each pattern
by the number of times it appears in the dictionary.
- Throw away the less frequently used combinations, and
any combinations too short or too long to be used as
a password.
- While we're at it with the dictionary file, find out
how frequently each letter is used, too.
- Randomly select a pattern from the dictionary parse.
- Randomly replace consonant-placeholders with consonants,
and vowel placeholders with vowels. However, weight
the selection of
each letter based on the frequency with which it appeared
in the dictionary.
- Randomly pick a position to capitalize. Do so, but only
if using the capital letter won't confuse things. (i.e.
no capital 'O's and zeroes in the same password.)
- Tack on a couple of numbers or punctuation marks.
They decided it was secure enough...not as good as random,
much better than most. (I haven't calculated the number of
possible outputs from this program. It's a lot.)
No, it's not as secure as randomly picking all characters
in any order. But it is a lot easier to remember the result,
since you can usually pronounce it. Often it sounds like a
real word. (Sometimes this is a problem...users probably
don't want a password that has "feces" in it ;)
Here are some of the passwords this program generated for
me:
- 9Vapadvi0
- 8meleRe$
- 2dudeRanl!
- ?neroteTep!
- 5revU?
- 5teEcon$
- 5Llanse6
- !hAer0
- 2rutRoar7
- 8Tirceneri6
- 6rescAdoler@
- 0tinenEuh2
- ?lavapBi%
- @socozEh0
- 8edRetnan8
- 8Tenporor6
- %tunrohcero%
- $fepegRe0
Update 2 aug '01: Forgot my -w on the
shebang line. Mea culpa. Applied said -w and
was pleased there was nothing else to fix.
#!/usr/bin/perl -w
use strict;
# some constants useful for changing the configuration
use constant MIN_LENGTH => 6;
use constant MAX_LENGTH => 12;
use constant MIN_SAMPLES => 750; # min samples is the minimum number o
+f times
# a vowel-consonant pattern appears i
+n the dictionary
sub parsedict {
# this sub parses a dictionary file (specified at the command line
# into a series of vowel-consonant patterns, weighted by the number
# of times each pattern appears in the dictionary. It writes the
# hash of patterns and weights to a file called "lingua". It also
# tracks the frequency of use for each letter of the alphabet, and
# stores that information in a file called "letters".
my @consonants = split //, 'bcdfghjklmnpqrstvwxz';
my @vowels = split //, 'aeiouy';
my (%letters, %letterdist, %result, %stats);
foreach (@consonants) { $letters{$_} = "c"; }
foreach (@vowels) { $letters{$_} = "v"; }
while (<>) {
chomp;
my @chars = split //, lc($_);
my $mapped;
foreach (@chars) { $mapped .= $letters{$_}; $letterdist{$_}++; }
$result{$mapped}++;
$stats{"words"}++;
}
open LINGUA, ">lingua";
foreach (sort { $result{$b} <=> $result{$a} } keys %result) {
(length($_) >= MIN_LENGTH - 2 && length($_) <= MAX_LENGTH - 2 && $
+result{$_} >= MIN_SAMPLES) and do {
print LINGUA "$_\t$result{$_}\n";
$stats{"patterns"}++;
}
}
close LINGUA;
open LETTERS, ">letters";
foreach (sort { $letterdist{$b} <=> $letterdist{$a} } keys %letterdi
+st) { print LETTERS "$_\t$letterdist{$_}\n"; }
close LETTERS;
return "Parsed $stats{'words'} words into $stats{'patterns'} pattern
+s within criteria.\n";
}
sub genpass {
# this sub chooses a pattern at random from the lingua file, and exc
+hanges
# 'c's and 'v's in the pattern with consonants and vowels, respectiv
+ely,
# based on a random letter selection weighted by the frequency of ea
+ch
# letter in the dictionary file.
# first, choose a pattern from the lingua file
srand; # not strictly necessary as current versions of perl do this
+automatically
my @pattern;
open (LINGUA, "<lingua") or die "Could not open lingua file: $!";
rand($.) < 1 && (@pattern = (split /\t/)) while (<LINGUA>);
close LINGUA;
# second, parse the letters file and build a hash of letters and wei
+ghts
my (%cons, %vowels, $constotal, $voweltotal);
open (LETTERS, "letters") or die "Could not open letter file: $!";
while (<LETTERS>) {
chomp;
my ($key, $value) = split /\t/;
if ($key =~ /[aeiouy]/) { $voweltotal += $value; $vowels{$key} = $
+voweltotal; }
else { $constotal += $value; $cons{$key} = $
+constotal; }
}
# build a couple of routines for randomly selecting vowels and conso
+nants
# these two routines could be combined into one, but i was too lazy
+to do it
# the most elegant way...so it's like this.
my $randomvowel = sub {
my $index = rand($voweltotal); my $choice;
foreach (sort { $vowels{$b} <=> $vowels{$a}
+} keys %vowels) {
$choice = $_;
if ($vowels{$_} < $index) {
last;
}
}
return $choice;
};
my $randomcons = sub {
my $index = rand($constotal); my $choice;
foreach (sort { $cons{$b} <=> $cons{$a} } ke
+ys %cons) {
$choice = $_;
if ($cons{$_} < $index) {
last;
}
}
return $choice;
};
# here's where we actually map random characters into the pattern
my @tomap;
my @orig = split //, $pattern[0];
foreach (@orig) { push @tomap, ($_ eq 'c') ? &$randomcons : &$random
+vowel; }
# good passwords will have at least one letter capitalized. choose o
+ne here.
# note that not all letters are given capital equivalents, making it
+ easier
# to identify "confusing" letters. There are no capital O's, only ze
+ros,
# for example.
my @case = split //, 'ABCDEFGHiJKLMNoPQRSTUVWXYZ';
my $ucpos = int (rand(@orig));
$tomap[$ucpos] = $case[ord($tomap[$ucpos]) - 97];
# good passwords also use some non-alpha characters, interspersed. t
+his
# algorithm tacks one on the front, and one on the back of the passw
+ord
# it just generated. not the most secure way to do it, but better th
+an
# not doing it (and still easy for the user to work with.)
my @puncs = split //, '!?@#$%&0123456789';
my $mapped = $puncs[rand(@puncs)] . (join '', @tomap) . $puncs[rand(
+@puncs)];
# finally, return the generated password.
return $mapped . "\n";
}
# simple enough main...
# if an argument is given, parse it as the dictionary. if not,
# generate a password.
print @ARGV ? parsedict : genpass;
<CODE>
Re: Password generator using a linguistic rule base
by astaines (Curate) on Aug 02, 2001 at 01:40 UTC
|
Hi
Very cute.
There are some nice extensions to this idea, at least one of which
gives better passwords using first, second and third order
approximations to English. See an especially nice example at
Tom Van Vleck's site.
A nice feature of this type of program is that you can
develop memorable, but hard to crack passwords for languages besides
English, by simply stuffing in an appropriate word list.
Thanks again
--
Anthony Staines
| [reply] |
|
That was the intention, though I didn't comment on it, due
to the usual US-centric egotism ;) and the fact that the
vowel & consonant arrays aren't broken out to be easily
configured. But internationalization was on my mind when I
wrote it. (In fact, most of my passwords are based on
Hangeul - Korean - words, since that language and culture
is of interest to me.)
I've briefly reviewed the third-order code on
Tom Van Vleck's site - it's definitely an improvement. If I
had bothered to research this at all before implementing it,
I probably would have either integrated that idea, or
decided the problem had already been solved and was becoming
more complicated than I felt like dealing with ;)
Actually, while I did not know whether it had been done
before, I was not so bold as to suppose noone had ever had
the idea. (I should have mentioned that in my post. "While
I dreamed this program up without knowingly copying someone
else's idea, it may have been done before, somewhere.")
I just didn't research it at all. (Possibly I was
afraid I'd find that some monopoly had a current patent
on the method, and I'd be opening myself up to a lawsuit
or worse, a DMCA violation and imprisonment, by posting it
here ;)
Thanks for the link. Maybe some day I'll modify the
program to pay attention to frequent neighbors for each
letter. Definitely a good idea.
| [reply] |
Re: Password generator using a linguistic rule base
by bastard (Hermit) on Aug 02, 2001 at 02:53 UTC
|
Congratulations, definitely a step above for the
generation of passwords that are easy to remember, but
hard to guess.
One thing to keep in mind. While it is hard for a person
to guess, the same concept can be used to narrow the
initial selection set of passwords needed to guess.
Implemented in a program (a simple modification of your
code) it would have a much better success ratio of guessing
the passwords. In the end though they are definitely
tougher to guess than standard dictionary words, so it
is definitely an improvement.
I used to pick passwords of random characters that felt
natural to type. Some combinations of characters are just
awkward to enter quickly. Knowing that a password cracker
to begin its search based on netural key combinations. This
would likely shorten the search time.
| [reply] |
|
Knowing that letter patterns and
character distribution are based on language should make the
resultant passwords a little easier to guess. Obviously,
knowing that a host under attack requires passwords created
with this code would narrow down the universe of possible
combinations. But how much would that help? I'm not good at probabilities,
but it appears to me that a program guessing passwords
generated by code like this would still have a huge job
ahead of it. Maybe this is the wrong forum to ask for help
with such analysis. I think it's a good algorithm.
Ultimately, I am hoping for a routine where an attacker,
knowing that the passwords were created with this code, would
not get significant advantage from that knowledge. It's
probably not there, yet, and might not get there if I don't
bone up on probability.
Thanks for your positive comments.
| [reply] |
|
Don't get me wrong, I think this is a great tool. I was just
mentioning the pitfalls of something that excludes a set
of passwords for the available selection set. (like my choosing
of passwords that are comfortable to type quickly) When taken
in the context of the entire net it will still be generating
passwords that are probably an order of mangitude (or more),
more difficult to crack than the average password out there.
(on the other hand it is also probably an order of magnitude
or more easier to guess than a truly random password).
The only people who would really be able to take advantage
of such a technique are those with some level of cryptanalytic
ability. Who know a thing or two about character frequencies
and the human element. Heck real cryptanalysts can take
advantage of a faulty random number generator.
back during wwii the germans broke the codes on a number
of british one-time-pads. (Theoretically unbreakable).
It happened like this.
To create the one time pads someone would take balls with
letters on them out of a spherical cage. After each ball
was selected they would spin the cage (after closing the
hatch). They were not supposed to be looking at the letters
during the selection process. After a while they did indeed
start looking at the balls. Sub-conciously they would pick
letter combinations that they felt were random, but actually
were not. Speculating that this was the case the germans did
a bit of research, and discovered the the frequencies of
combinations and were ultimately able to crack a number of
the brittish one-time-pads.
| [reply] |
Re: Password generator using a linguistic rule base
by Chrisf (Friar) on Aug 01, 2001 at 18:16 UTC
|
Hey, that's pretty cool. Nicely done :) | [reply] |
Re: Password generator using a linguistic rule base
by I0 (Priest) on Aug 02, 2001 at 04:16 UTC
|
The generated password can be guessed by guessing the srand seed | [reply] |
|
This is true, but you would also need a copy of the script to know how the passwords are initially generated and have a password to verify the seed against, anything else would be so close to brute cracking that the time investment would be about the same, don't you think... and at least it encourages users not to pick bad passwords.
Just Another Perl Backpacker
| [reply] |
|
|