It's interesting to follow the development of a hard problem like this, to see what works and what doesn't, and more importantly to understand why.
This particular case appears to be a less strict version of the Scrabble problem (adding your letters to the existing board, instead of just forming words). I've looked at that space a bit myself.
A Word Twister solution might be
(1) create a sorted anagram lookup for the dictionary,
(2) join the anagram keys into a string with an appropriate separator,
(3) determine the letter counts in the given letters,
(4) create regex snippet strings for each letter count (e.g., "a{0,3}"),
(5) permute the regex snippets into larger regex strings,
(6) match the anagram key string against the larger regex strings,
(7) print the resulting words
Maybe something like this:
eval '(exit $?0)' && eval 'exec perl -w -S $0 ${1+"$@"}'
&& eval 'exec perl -w -S $0 $argv:q'
if 0;
# The above invocation finds perl in the path, wherever it may be
# find words in dictionary containing only letters given
# usage: $0 given_letters word_list_file_name[s]
use strict;
use warnings;
our $min_word_length = 3;
our $given = shift;
our $given_length = length( $given );
our @factorial;
$factorial[0] = 1;
# read in dictionary, removing invalid words, and words with letters n
+ot in the given list
warn "Reading in dictionary\n";
our @word_list;
while (<>)
{
chomp;
next unless /^[a-z]{$min_word_length,$given_length}$/;
my $temp;
($temp = $_) =~ s/[$given]//g;
next if length($temp);
push @word_list, $_;
}
warn "\t", scalar @word_list, " words in dictionary (after filtering)\
+n";
warn "Creating dictionary anagrams\n";
our %word_list_sorted_anagrams;
for my $word (@word_list)
{
my $key = join('', sort split '', $word);
# some anagrams will not be unique
push @{$word_list_sorted_anagrams{$key}}, $word;
}
# match this against a regex later
my $word_list_sorted_anagrams_key_string = join ',', keys %word_list_s
+orted_anagrams;
warn "\t", scalar keys %word_list_sorted_anagrams, " anagram keys\n";
warn "Creating given regexes\n";
# convert given to regex strings of the form "a{0,3}b{0,2}..."
my @given = split '', $given;
my %given_counts;
for my $g (@given)
{
$given_counts{$g}++;
}
my @given_regexes;
for my $g (keys %given_counts)
{
push @given_regexes, sprintf "%s{0,%d}", $g,$given_counts{$g},$giv
+en_counts{$g};
}
# permute the regex strings
warn "Permuting given regexes\n";
my %regex_permutations;
my $n = 0;
for my $n (0..factorial(scalar @given_regexes)-1)
{
$regex_permutations{permutation_n($n,@given_regexes)} = 1;
$n++;
}
warn "\t", scalar keys %regex_permutations, " regex permutations\n";
# find all matching anagram keys
warn "Matching anagram keys\n";
# number of permutations can be large, use a loop instead of a monster
+ regex
my @key_match_results;
my $counter; # for progress indicator, this loop can take a while!
for my $regex_permutation (keys %regex_permutations)
{
my @results = grep {defined($_) and length($_)} $word_list_sorted_
+anagrams_key_string =~ m/\b$regex_permutation\b/g;
push @key_match_results, grep {defined($_) and length($_)} @result
+s;
$counter++;
# print STDERR "." unless $counter % 100; # progress indicator for
+the impatient
}
warn "\n\t", scalar @key_match_results, " matching anagram keys\n";
# lookup words from keys
warn "Looking up matching words\n";
my @words_matched;
for my $k (@key_match_results)
{
push @words_matched, @{$word_list_sorted_anagrams{$k}};
}
print "@words_matched\n";
warn "\t", scalar @words_matched, " words matched\n";
exit;
###########################################3
# Find and return the $n'th permutation
# of the remaining arguments in some canonical order
# (modified from QOTW solution)
sub permutation_n
{
my $n = shift;
my $result = '';
while (@_)
{
($n, my $r) = (int($n/@_), $n % @_);
$result .= splice @_, $r, 1;
}
return $result;
}
###########################
# we might do this a lot, so cache the results
sub factorial
{
my $n = shift;
# if we already know it, return it
return $factorial[$n] if defined $factorial[$n];
# else compute it from the largest known result
my $result = $factorial[$#factorial];
for my $k ( $#factorial+1..$n )
{
$result *= $k;
}
return $result;
}
Using the http://perl.plover.com/qotw/words/Web2.gz, and posterboy as the given:
> time word_twister.pl posterboy web2* >! posterboy.txt
Reading in dictionary
729 words in dictionary (after filtering)
Creating dictionary anagrams
512 anagram keys
Creating given regexes
Permuting given regexes
40320 regex permutations
Matching anagram keys
460888 matching anagram keys
Looking up matching words
763232 words matched
137.060u 0.290s 2:20.51 97.7% 0+0k 0+0io 2871pf+0w
Now this is pretty slow at filtering the dictionary against the given, and even slower for longer given strings. Improvements welcome.
For your particular case, you might want it to loop, asking for a new given, and not reading in the dictionary every time. (But that doesn't save much work, as the dictionary filtering against the given is expensive the way I've done it here.)
-QM
--
Quantum Mechanics: The dreams stuff is made of
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|