sbrothy has asked for the wisdom of the Perl Monks concerning the following question:
I'm sorry to be back with yet another "minor" problem, but this is slowly driving me up the wall.
Even more so because I had this working but for some reason it doesn't now. The problem is with the regex wildcard ".".
#!/usr/bin/perl
#########################################
# The purpose of this program is to help
# cheat at Scrabble. Ultimately though it's more
# about the intellectual exercise of writing the
# actual program. I've tried using it (on http://isc.ro)
# and found that it's more work than it's worth. In fact,
# once I tried it (I *nearly* (*cough*) always informed
# my opponents beforehand) my rating quickly deteriorated.
#
# Also, I don't have the program on my phone so when I'm
# out and about I show my true colors anyway.
#
# There's actual a little embarrassing anecdote here:
#
# I started a game (20 minutes, challenge: DOUBLE) at
# home, full of confidence that I would beat my opponent
# with the help of my litle program. Unfortunately, my
# opponent wanted to adjourn due to some emergency. What
# happened then was ofcourse that we continued the game
# while I was driving a bus to work. Needless to say my
# performance was noticeable sub-stellar. So bad, in fact,
# that I had to come clean. Luckily for me he/she took it in
# full stride enjoying serving me my head on a platter. :)
#
# sbrothy 21:26 9/9/2021
#
#########################################
use autodie;
use diagnostics;
use strict;
use warnings;
use 5.010;
use File::Fetch;
my $ff = File::Fetch->new(uri =>
'https://www.wordgamedictionary.com/sowpods/download/sowpods.txt'
);
my $csw = $ff->fetch(to => '/tmp');
my @words;
open my $fh, '<', $csw;
foreach (<$fh>) {
tr/\r\n//d;
tr/\[A-Z]/[a-z]/;
next if /\s+/; # dirty way of removing comments
push @words, $_;
}
close $fh;
############################################
my $tiles = 'a.';
print "MATCHES FOR: $tiles\n";
my @m = find_matches($tiles);
print "\n@m\n";
############################################
$tiles = 's.';
print "MATCHES FOR: $tiles\n";
@m = find_matches($tiles);
print "\n@m\n";
############################################
sub find_matches {
my $letters = shift;
my $regex = join '', map "$_?", sort split //, $letters;
my @matches;
foreach my $word (@words) {
if(join('', sort split //, $word) =~ /^$regex$/) {
push @matches, $word;
}
}
return @matches;
}
As you can see the wildcard behaves erratically. I had this working but I'm pretty sure I didn't do anything special.
Will you humor me one last time?
Regards, sbrothy
Sorry, forgot to provide the output:
MATCHES FOR: a.
aa
MATCHES FOR: s.
as es is os sh si so
And just for the record changing the order of the letters doesn't change anything as they're sorted. Obviously.
Re: Problem with regex wildcard operator (.)...?
by kcott (Archbishop) on Sep 06, 2021 at 09:21 UTC
|
G'day sbrothy,
I suggest you take a look at Regexp::Debugger.
That will allow you to step through the matching process and see exactly what is going on.
| [reply] |
Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Archbishop) on Sep 06, 2021 at 06:31 UTC
|
A note on usage. The tr/// operator does not have [...] character classes like regexes (as might be used in s///), but only ranges, e.g., a-z, so the \[ [ ] sequences in the tr/\[A-Z]/[a-z]/ expression in the OP represent literal [ ] characters (see tr/// in Quote-Like Operators in perlop).
So '[' is being translated to '[' and ']' to ']'. This does no harm (indeed, the tr/// compiler may optimize this away), but suggests a misunderstanding of tr///.
And since the tr/// is just translating upper case to lower, a simple
$_ = lc;
statement might be clearer, hence better (see lc).
Update: I see now that Marshall has already made this point at the end of this post.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Archbishop) on Sep 07, 2021 at 11:21 UTC
|
I'm sure there's a better Perl scrabble cheater already in existence, but I enjoyed the exercise. The "requirement" I sought to satisfy is that given a word already on the scrabble board and a tray of letters one can use to augment the word, return a list of all words from a standard dictionary that can be formed from the given word using only one or more letters from the tray added to the beginning and/or end of the given word.
This code is minimally tested, but looks good to me. It also seems a good candidate for encapsulation in a module with a thorough test file. There are also a few enhancements one can imagine, e.g., listing the candidate words found along with their scrabble scores and in order sorted by score/alpha. I think this code will work under Perl version 5.8.9, but I haven't tested it with this version. The dictionary I used for testing is not an "official"
scrabble dictionary, but a general dictionary I had on hand.
scrabble_cheater_3.pl:
Output:
Win8 Strawberry 5.30.3.1 (64) Tue 09/07/2021 6:24:31
C:\@Work\Perl\monks\Marshall
>perl scrabble_cheater_3.pl
given: 'led'
tray: 'ooidgkle'
deLED
doiLED
doLED
dolLED
gelLED
gilLED
gLED
gLEDe
idLED
kilLED
LEDe
LEDge
LEDged
LEDol
ogLED
oiLED
Win8 Strawberry 5.30.3.1 (64) Tue 09/07/2021 7:22:34
C:\@Work\Perl\monks\Marshall
>perl scrabble_cheater_3.pl -g no -t wk
given: 'no'
tray: 'wk'
kNOw
NOw
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Problem with regex wildcard operator (.)...?
by LanX (Saint) on Sep 05, 2021 at 20:26 UTC
|
> As you can see the wildcard behaves erratically.
Sorry I don't see your problem, could you please describe it better and provide a SSCCE ?
> Will you humor me one last time?
We are happy to help, as long as you don't make it unnecessary difficult for us. :)
| [reply] |
|
| [reply] |
|
OK, I shortened it, hope this is more acceptable. In the process I discovered, as I suspected, that it has something to do with the regex order, but exactly what is still pretty opaque to me.
#!/usr/bin/perl
my @words = ('aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so');
print "WORDS: @words\n";
print "----------------------------\n";
my $regex = ".?a?";
print "Matches for regex pattern : $regex\n\n";
foreach (@words) {
if(join('', sort split //, $_) =~ /^$regex$/) {
print "$_\n";
}
}
print "----------------------------\n";
my $regex = "a?.?";
print "Matches for regex pattern : $regex\n\n";
foreach (@words) {
if(join('', sort split //, $_) =~ /^$regex$/) {
print "$_\n";
}
}
print "----------------------------\n";
$regex = "s?.?";
print "Matches for regex pattern : $regex\n\n";
foreach (@words) {
if(join('', sort split //, $_) =~ /^$regex$/) {
print "$_\n";
}
}
print "----------------------------\n";
$regex = ".?s?";
print "Matches for regex pattern : $regex\n\n";
foreach (@words) {
if(join('', sort split //, $_) =~ /^$regex$/) {
print "$_\n";
}
}
print "----------------------------\n";
OUTPUT
WORDS: aa as es is os sh si so
----------------------------
Matches for regex pattern : .?a?
aa
----------------------------
Matches for regex pattern : a?.?
aa
as
----------------------------
Matches for regex pattern : s?.?
----------------------------
Matches for regex pattern : .?s?
as
es
is
os
sh
si
so
----------------------------
Regards.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Archbishop) on Sep 06, 2021 at 06:06 UTC
|
BTW: If you are going to do matching against a word list after the words in the list have had their letters sorted into alpha order and if you will do many matches against the same word list, it will be (possibly much) more efficient to first build a hash of all sorted versus native words.
Win8 Strawberry 5.8.9.5 (32) Mon 09/06/2021 1:53:18
C:\@Work\Perl\monks
>perl -Mstrict -Mwarnings
use Data::Dump qw(dd); # for debug
my @words = ( # added a few extra 'words'
'', 'x', 'a', 's', 'aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so'
);
dd '@words:', \@words; # for debug
# map words to sorted words.
my %sorted = map { $_ => join('', sort split //) } @words;
dd 'sorted words hash:', \%sorted; # for debug
print "----------------------------\n";
for my $regex (qw(.?a? a?.? s?.? .?s?)) {
my $rx_full = qr{ ^ $regex $ }x; # faster if used often
# my $rx_full = qr{ \A $regex \z }xms; # i prefer \A \z \Z and /xms
+ tail
print "Matches for FULL regex pattern: $rx_full \n";
for my $word (@words) {
print "'$sorted{$word}' -> '$word' " if $sorted{$word} =~ $rx
+_full;
}
print "\n----------------------------\n";
}
^Z
(
"\@words:",
["", "x", "a", "s", "aa", "as", "es", "is", "os", "sh", "si", "so"],
)
(
"sorted words hash:",
{
"" => "",
"a" => "a",
"aa" => "aa",
"as" => "as",
"es" => "es",
"is" => "is",
"os" => "os",
"s" => "s",
"sh" => "hs",
"si" => "is",
"so" => "os",
"x" => "x",
},
)
----------------------------
Matches for FULL regex pattern: (?x-ism: ^ .?a? $ )
'' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa'
----------------------------
Matches for FULL regex pattern: (?x-ism: ^ a?.? $ )
'' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa' 'as' -> 'a
+s'
----------------------------
Matches for FULL regex pattern: (?x-ism: ^ s?.? $ )
'' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's'
----------------------------
Matches for FULL regex pattern: (?x-ism: ^ .?s? $ )
'' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'as' -> 'as' 'es' -> 'e
+s' 'is' -> 'is' 'os' -> 'os' 'hs' -> 'sh' 'is' -> 'si' 'os' -> '
+so'
----------------------------
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Problem with regex wildcard operator (.)...?
by karlgoethebier (Abbot) on Sep 06, 2021 at 10:55 UTC
|
«…cheat at Scrabble…»
See Games::Literati. I don’t know if it’s good but it seems like it is maintained. Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] |
Re: Problem with regex wildcard operator (.)...?
by BillKSmith (Monsignor) on Sep 06, 2021 at 03:37 UTC
|
Your misunderstanding has nothing to do with the metacharacter '.', but rather with the quantifier '?'. When a character (or metacharacter) in your regex matches zero times, you will reach the end of the regex before you have matched all the characters in the string. Your 'end-of-string' anchor '$' fails.
| [reply] |
Re: Problem with regex wildcard operator (.)...?
by Marshall (Canon) on Sep 06, 2021 at 07:25 UTC
|
I really don't know the requirements for your application.
I have taken an imperfect attempt at it below.
I did come close to your output for "s." but not exactly.
My output for "a." is wildly off from what you show.
I downloaded your DB file to a local file to make my testing faster.
I don't think that makes any difference.
I would like more text to describe what you want to have happen.
#!/usr/bin/perl
use strict;
use warnings;
my @words;
open my $fh, '<', 'EuroScrabbleWordList.txt' or die "can't open word l
+ist $!";
foreach (<$fh>)
{
tr/\r\n//d;
tr/A-Z/a-z/;
next if /\s/; # dirty way of removing comments
push @words, $_;
}
close $fh;
############################################
my $tiles = 'a.';
# print all words with 2 letters that contain "a"
print "\nMATCHES FOR: $tiles\n";
my @m = find_matches($tiles);
print "\n@m\n";
$tiles = 's.';
# print all words with 2 letters that contain "s"
print "\nMATCHES FOR: $tiles\n";
@m = find_matches($tiles);
print "\n@m\n";
$tiles = 'a.a';
#print all words with letters that contain 2 a's
print "\nMATCHES FOR: $tiles\n";
@m = find_matches($tiles);
print "\n@m\n";
############################################
sub find_matches
{
my $pattern = shift;
my $max_chars = length $pattern;
my @matches;
$pattern =~ s/\W+//g; #delete the dots
my $raw_letters = $pattern;
my $regex = "";
$regex .= "[$raw_letters].?" for (1..length $raw_letters);
print "$regex\n";
foreach my $word (@words)
{
next if (length ($word) > $max_chars);
push (@matches, $word) if $word =~ /$regex/;
}
return @matches;
}
__END__
MATCHES FOR: a.
[a].?
aa ab ad ae ag ah ai al am an ar as at aw ax ay ba da ea fa ha ja ka l
+a ma na pa ta ya za
MATCHES FOR: s.
[s].?
as es is os sh si so st us
MATCHES FOR: a.a
[aa].?[aa].?
aa aah aal aas aba aga aha aia aka ala ama ana aua ava awa baa caa faa
+ maa
UPDATE:
I don't claim that my code is a general solution to your problem.
In fact, I think that many enhancements are necessary! This was just minimal code to answer some simple questions.
In terms of algorithms, let's start with something simple:
for "s.":
You say: as es is os sh si so
I say: as es is os sh si so st us
I am completely unable to understand why st and us should be missing from the output? Please explain.
| [reply] [d/l] [select] |
|
|