I was looking for an Anagram module on CPAN: Something that
could read in a dictionary and use that to create simple
anagrams of words with an
anagram($word) function. I
couldn't find anything, so I wrote my own module. I wanted
to submit it to CPAN, but I've never done that before. My
questions are two-fold:
#1: What should I do before submitting the module? I'd like
to set up something that works with
MakeMaker (someone
suggested
h2xs). What about general documentation?
Clearly there is no end to the amount of support you could
give a module, but what's a reasonable lower bound for
supporting this hack?
#2: I've attached the module code and a short script to run
it below. What is stylistically expected from a CPAN
module? How could this code get cleaned up? Are there any
algorithmic changes that would make the code run faster
(or support more than 1 destination word?) (If you're curious, the code parses a
dictionary into a standard tree format and then when anagraming
the words, walks through the tree on a prefix-by-prefix basis.
A few of my variable names suck. Let me know if you have
better names than
$sub_table and
$sub_prefix.)
anagram:
#!/usr/bin/perl -w
use strict;
use Anagram;
while (<>) {
chomp;
my $word = $_;
s/\W//g;
my $base = lc;
my @words = grep { $base ne $_ } Anagram::anagram($base);
next unless @words;
print "\nAnagrams of $_:\n";
print " $_\n" foreach @words;
}
Anagram.pm:
use strict;
use Carp;
package Anagram;
@Anagram::EXPORT = qw(anagram is_word starts_word);
BEGIN {
$Anagram::version = "1.0";
$Anagram::dict = {};
open DICT, "</usr/dict/words" or croak("Could not open dictionary:
+ $!");
foreach (<DICT>) {
s/\W//g;
my @letters = split //, lc;
# Insert word into sorted dictionary tree
my $sub_table = $Anagram::dict;
foreach (@letters) {
# Create sub-table if it's missing and then recurse
$sub_table->{$_} = {} if ref($sub_table->{$_}) ne 'HASH';
$sub_table = $sub_table->{$_};
}
# Flag this entry as a valid word
$sub_table->{word} = 1;
}
close DICT;
}
sub version {
my $version = shift;
warn "Version $version is later than $Anagram::version."
if defined($version) && ($version > $Anagram::version);
return $Anagram::version;
}
sub anagram {
my $word = shift;
# Create table of letters. This is better than a pure table
# because we avoid extra work for duplicated letters-- we can
# anagram "aaaaa" in one cycle instead of 5! = 120 cycles.
my $letters = {};
foreach (split //, $word) {
$letters->{$_}++;
}
return permute_recurse($letters, "");
}
sub permute_recurse {
my ($letters, $prefix) = @_;
my @words = ();
foreach (keys %$letters) {
# Test if any words start with this new prefix
my $sub_prefix = $prefix.$_;
my $sub_table = starts_word($sub_prefix);
next unless $sub_table;
# Use this letter
if ($letters->{$_} <= 1) {
delete $letters->{$_};
} else {
$letters->{$_}--;
}
# Test for recurse case
if (scalar keys %$letters) {
push @words, permute_recurse($letters, $sub_prefix);
}
# Test for base case
elsif ($sub_table->{word}) {
push @words, $sub_prefix;
}
# Restore letter
$letters->{$_}++;
}
return @words;
}
sub is_word {
my $sub_table = walk_dict(@_) or return undef;
return $sub_table->{word};
}
# Returns a sub-table entry on true and undef on failutre
sub starts_word {
my ($word) = @_;
my $sub_table = $Anagram::dict;
foreach (split //, $word) {
$sub_table = $sub_table->{$_};
return undef unless ref($sub_table) eq 'HASH';
}
return $sub_table;
}
1;
-Ted