comment on

We all know the right way to select a password, right? Mix upper and lower case, pick random letters, mix in some numbers and punctuation, and try to remember the final result for longer than 30 seconds. If you assign passwords like this to users, it's a matter of minutes before the religious radio station emails you with a request that his/her password be changed to "Jesus".

So I tried to grok the problem and come up with a solution. Why not use random passwords that are actually based on real language? I mean, people are used to being able to pronounce things, more or less, and certain patterns are more comfortable to remember. It's like phone numbers. String 8562621937 together, and it's tough to remember. Format it as 856-262-1937 and most people can recall it without too much trouble. (The area code is South Jersey, but I don't know whose number that is...I wouldn't suggest you dial it ;)

The algorithm I proposed to some security-minded friends is as follows:

Parse a dictionary file (say, /usr/share/dict/words) into consonant-vowel patterns, and weigh each pattern by the number of times it appears in the dictionary.
Throw away the less frequently used combinations, and any combinations too short or too long to be used as a password.
While we're at it with the dictionary file, find out how frequently each letter is used, too.
Randomly select a pattern from the dictionary parse.
Randomly replace consonant-placeholders with consonants, and vowel placeholders with vowels. However, weight the selection of each letter based on the frequency with which it appeared in the dictionary.
Randomly pick a position to capitalize. Do so, but only if using the capital letter won't confuse things. (i.e. no capital 'O's and zeroes in the same password.)
Tack on a couple of numbers or punctuation marks.

They decided it was secure enough...not as good as random, much better than most. (I haven't calculated the number of possible outputs from this program. It's a lot.)

No, it's not as secure as randomly picking all characters in any order. But it is a lot easier to remember the result, since you can usually pronounce it. Often it sounds like a real word. (Sometimes this is a problem...users probably don't want a password that has "feces" in it ;)

Here are some of the passwords this program generated for me:

9Vapadvi0
8meleRe$
2dudeRanl!
?neroteTep!
5revU?
5teEcon$
5Llanse6
!hAer0
2rutRoar7
8Tirceneri6
6rescAdoler@
0tinenEuh2
?lavapBi%
@socozEh0
8edRetnan8
8Tenporor6
%tunrohcero%
$fepegRe0

Update 2 aug '01: Forgot my -w on the shebang line. Mea culpa. Applied said -w and was pleased there was nothing else to fix.

#!/usr/bin/perl -w
use strict;

# some constants useful for changing the configuration
use constant MIN_LENGTH => 6;  
use constant MAX_LENGTH => 12;
use constant MIN_SAMPLES => 750; # min samples is the minimum number o
+f times
                                 # a vowel-consonant pattern appears i
+n the dictionary

  
sub parsedict {

  # this sub parses a dictionary file (specified at the command line
  # into a series of vowel-consonant patterns, weighted by the number
  # of times each pattern appears in the dictionary. It writes the
  # hash of patterns and weights to a file called "lingua". It also
  # tracks the frequency of use for each letter of the alphabet, and
  # stores that information in a file called "letters".

  my @consonants = split //, 'bcdfghjklmnpqrstvwxz';
  my @vowels =     split //, 'aeiouy';
  my (%letters, %letterdist, %result, %stats);
 
  foreach (@consonants) { $letters{$_} = "c"; }
  foreach (@vowels)     { $letters{$_} = "v"; }
 
  while (<>) {
    chomp;
    my @chars = split //, lc($_);
    my $mapped;
    foreach (@chars) { $mapped .= $letters{$_}; $letterdist{$_}++; }
    $result{$mapped}++;
    $stats{"words"}++;
  }
 
  open LINGUA, ">lingua";
  foreach (sort { $result{$b} <=> $result{$a} } keys %result) { 
    (length($_) >= MIN_LENGTH - 2 && length($_) <= MAX_LENGTH - 2 && $
+result{$_} >= MIN_SAMPLES) and do {
      print LINGUA "$_\t$result{$_}\n"; 
      $stats{"patterns"}++;
    }
  }
  close LINGUA;
 
  open LETTERS, ">letters";
  foreach (sort { $letterdist{$b} <=> $letterdist{$a} } keys %letterdi
+st) { print LETTERS "$_\t$letterdist{$_}\n"; }
  close LETTERS;

  return "Parsed $stats{'words'} words into $stats{'patterns'} pattern
+s within criteria.\n";
}

sub genpass {

  # this sub chooses a pattern at random from the lingua file, and exc
+hanges
  # 'c's and 'v's in the pattern with consonants and vowels, respectiv
+ely,
  # based on a random letter selection weighted by the frequency of ea
+ch
  # letter in the dictionary file.

  # first, choose a pattern from the lingua file
  srand; # not strictly necessary as current versions of perl do this 
+automatically
  my @pattern; 
  open (LINGUA, "<lingua") or die "Could not open lingua file: $!";
  rand($.) < 1 && (@pattern = (split /\t/)) while (<LINGUA>); 
  close LINGUA;

  # second, parse the letters file and build a hash of letters and wei
+ghts 
  my (%cons, %vowels, $constotal, $voweltotal);
  open (LETTERS, "letters") or die "Could not open letter file: $!";
  while (<LETTERS>) {
    chomp;
    my ($key, $value) = split /\t/;
    if ($key =~ /[aeiouy]/) { $voweltotal += $value; $vowels{$key} = $
+voweltotal; }
    else                    { $constotal  += $value; $cons{$key}   = $
+constotal;   }
  }
  
  # build a couple of routines for randomly selecting vowels and conso
+nants
  # these two routines could be combined into one, but i was too lazy 
+to do it
  # the most elegant way...so it's like this.
  my $randomvowel = sub {
                          my $index = rand($voweltotal); my $choice;
                          foreach (sort { $vowels{$b} <=> $vowels{$a} 
+} keys %vowels) {
                            $choice = $_;
                            if ($vowels{$_} < $index) {
                              last;
                            }
                          }
                          return $choice;
                        };

  my $randomcons =  sub {
                          my $index = rand($constotal); my $choice;
                          foreach (sort { $cons{$b} <=> $cons{$a} } ke
+ys %cons) {
                            $choice = $_;
                            if ($cons{$_} < $index) {
                              last;
                            }
                          }
                          return $choice;
                        };

  # here's where we actually map random characters into the pattern
  my @tomap;
  my @orig = split //, $pattern[0];
  foreach (@orig) { push @tomap, ($_ eq 'c') ? &$randomcons : &$random
+vowel; }

  # good passwords will have at least one letter capitalized. choose o
+ne here.
  # note that not all letters are given capital equivalents, making it
+ easier
  # to identify "confusing" letters. There are no capital O's, only ze
+ros,
  # for example.
  my @case = split //,   'ABCDEFGHiJKLMNoPQRSTUVWXYZ';
  my $ucpos = int (rand(@orig));
  $tomap[$ucpos] = $case[ord($tomap[$ucpos]) - 97];

  # good passwords also use some non-alpha characters, interspersed. t
+his
  # algorithm tacks one on the front, and one on the back of the passw
+ord
  # it just generated. not the most secure way to do it, but better th
+an
  # not doing it (and still easy for the user to work with.)
  my @puncs = split //,  '!?@#$%&0123456789';
  my $mapped = $puncs[rand(@puncs)] . (join '', @tomap) . $puncs[rand(
+@puncs)];

  # finally, return the generated password.
  return $mapped . "\n";
}

# simple enough main...
# if an argument is given, parse it as the dictionary. if not,
# generate a password.
print @ARGV ? parsedict : genpass;
[download]

<CODE>

In reply to Password generator using a linguistic rule base by ginseng

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks