http://www.perlmonks.org?node_id=790480


in reply to Turning A Problem Upside Down

It's interesting to follow the development of a hard problem like this, to see what works and what doesn't, and more importantly to understand why.

This particular case appears to be a less strict version of the Scrabble problem (adding your letters to the existing board, instead of just forming words). I've looked at that space a bit myself.

A Word Twister solution might be
(1) create a sorted anagram lookup for the dictionary,
(2) join the anagram keys into a string with an appropriate separator,
(3) determine the letter counts in the given letters,
(4) create regex snippet strings for each letter count (e.g., "a{0,3}"),
(5) permute the regex snippets into larger regex strings,
(6) match the anagram key string against the larger regex strings,
(7) print the resulting words

Maybe something like this:

eval '(exit $?0)' && eval 'exec perl -w -S $0 ${1+"$@"}' && eval 'exec perl -w -S $0 $argv:q' if 0; # The above invocation finds perl in the path, wherever it may be # find words in dictionary containing only letters given # usage: $0 given_letters word_list_file_name[s] use strict; use warnings; our $min_word_length = 3; our $given = shift; our $given_length = length( $given ); our @factorial; $factorial[0] = 1; # read in dictionary, removing invalid words, and words with letters n +ot in the given list warn "Reading in dictionary\n"; our @word_list; while (<>) { chomp; next unless /^[a-z]{$min_word_length,$given_length}$/; my $temp; ($temp = $_) =~ s/[$given]//g; next if length($temp); push @word_list, $_; } warn "\t", scalar @word_list, " words in dictionary (after filtering)\ +n"; warn "Creating dictionary anagrams\n"; our %word_list_sorted_anagrams; for my $word (@word_list) { my $key = join('', sort split '', $word); # some anagrams will not be unique push @{$word_list_sorted_anagrams{$key}}, $word; } # match this against a regex later my $word_list_sorted_anagrams_key_string = join ',', keys %word_list_s +orted_anagrams; warn "\t", scalar keys %word_list_sorted_anagrams, " anagram keys\n"; warn "Creating given regexes\n"; # convert given to regex strings of the form "a{0,3}b{0,2}..." my @given = split '', $given; my %given_counts; for my $g (@given) { $given_counts{$g}++; } my @given_regexes; for my $g (keys %given_counts) { push @given_regexes, sprintf "%s{0,%d}", $g,$given_counts{$g},$giv +en_counts{$g}; } # permute the regex strings warn "Permuting given regexes\n"; my %regex_permutations; my $n = 0; for my $n (0..factorial(scalar @given_regexes)-1) { $regex_permutations{permutation_n($n,@given_regexes)} = 1; $n++; } warn "\t", scalar keys %regex_permutations, " regex permutations\n"; # find all matching anagram keys warn "Matching anagram keys\n"; # number of permutations can be large, use a loop instead of a monster + regex my @key_match_results; my $counter; # for progress indicator, this loop can take a while! for my $regex_permutation (keys %regex_permutations) { my @results = grep {defined($_) and length($_)} $word_list_sorted_ +anagrams_key_string =~ m/\b$regex_permutation\b/g; push @key_match_results, grep {defined($_) and length($_)} @result +s; $counter++; # print STDERR "." unless $counter % 100; # progress indicator for +the impatient } warn "\n\t", scalar @key_match_results, " matching anagram keys\n"; # lookup words from keys warn "Looking up matching words\n"; my @words_matched; for my $k (@key_match_results) { push @words_matched, @{$word_list_sorted_anagrams{$k}}; } print "@words_matched\n"; warn "\t", scalar @words_matched, " words matched\n"; exit; ###########################################3 # Find and return the $n'th permutation # of the remaining arguments in some canonical order # (modified from QOTW solution) sub permutation_n { my $n = shift; my $result = ''; while (@_) { ($n, my $r) = (int($n/@_), $n % @_); $result .= splice @_, $r, 1; } return $result; } ########################### # we might do this a lot, so cache the results sub factorial { my $n = shift; # if we already know it, return it return $factorial[$n] if defined $factorial[$n]; # else compute it from the largest known result my $result = $factorial[$#factorial]; for my $k ( $#factorial+1..$n ) { $result *= $k; } return $result; }

Using the http://perl.plover.com/qotw/words/Web2.gz, and posterboy as the given:

> time word_twister.pl posterboy web2* >! posterboy.txt Reading in dictionary 729 words in dictionary (after filtering) Creating dictionary anagrams 512 anagram keys Creating given regexes Permuting given regexes 40320 regex permutations Matching anagram keys 460888 matching anagram keys Looking up matching words 763232 words matched 137.060u 0.290s 2:20.51 97.7% 0+0k 0+0io 2871pf+0w

Now this is pretty slow at filtering the dictionary against the given, and even slower for longer given strings. Improvements welcome.

For your particular case, you might want it to loop, asking for a new given, and not reading in the dictionary every time. (But that doesn't save much work, as the dictionary filtering against the given is expensive the way I've done it here.)

-QM
--
Quantum Mechanics: The dreams stuff is made of