Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Problem with regex wildcard operator (.)...?

by sbrothy (Acolyte)
on Sep 05, 2021 at 20:22 UTC ( #11136476=perlquestion: print w/replies, xml ) Need Help??

sbrothy has asked for the wisdom of the Perl Monks concerning the following question:

I'm sorry to be back with yet another "minor" problem, but this is slowly driving me up the wall.

Even more so because I had this working but for some reason it doesn't now. The problem is with the regex wildcard ".".

#!/usr/bin/perl ######################################### # The purpose of this program is to help # cheat at Scrabble. Ultimately though it's more # about the intellectual exercise of writing the # actual program. I've tried using it (on http://isc.ro) # and found that it's more work than it's worth. In fact, # once I tried it (I *nearly* (*cough*) always informed # my opponents beforehand) my rating quickly deteriorated. # # Also, I don't have the program on my phone so when I'm # out and about I show my true colors anyway. # # There's actual a little embarrassing anecdote here: # # I started a game (20 minutes, challenge: DOUBLE) at # home, full of confidence that I would beat my opponent # with the help of my litle program. Unfortunately, my # opponent wanted to adjourn due to some emergency. What # happened then was ofcourse that we continued the game # while I was driving a bus to work. Needless to say my # performance was noticeable sub-stellar. So bad, in fact, # that I had to come clean. Luckily for me he/she took it in # full stride enjoying serving me my head on a platter. :) # # sbrothy 21:26 9/9/2021 # ######################################### use autodie; use diagnostics; use strict; use warnings; use 5.010; use File::Fetch; my $ff = File::Fetch->new(uri => 'https://www.wordgamedictionary.com/sowpods/download/sowpods.txt' ); my $csw = $ff->fetch(to => '/tmp'); my @words; open my $fh, '<', $csw; foreach (<$fh>) { tr/\r\n//d; tr/\[A-Z]/[a-z]/; next if /\s+/; # dirty way of removing comments push @words, $_; } close $fh; ############################################ my $tiles = 'a.'; print "MATCHES FOR: $tiles\n"; my @m = find_matches($tiles); print "\n@m\n"; ############################################ $tiles = 's.'; print "MATCHES FOR: $tiles\n"; @m = find_matches($tiles); print "\n@m\n"; ############################################ sub find_matches { my $letters = shift; my $regex = join '', map "$_?", sort split //, $letters; my @matches; foreach my $word (@words) { if(join('', sort split //, $word) =~ /^$regex$/) { push @matches, $word; } } return @matches; }

As you can see the wildcard behaves erratically. I had this working but I'm pretty sure I didn't do anything special.

Will you humor me one last time?

Regards, sbrothy

Sorry, forgot to provide the output:

MATCHES FOR: a. aa MATCHES FOR: s. as es is os sh si so

And just for the record changing the order of the letters doesn't change anything as they're sorted. Obviously.

Replies are listed 'Best First'.
Re: Problem with regex wildcard operator (.)...?
by kcott (Bishop) on Sep 06, 2021 at 09:21 UTC

    G'day sbrothy,

    I suggest you take a look at Regexp::Debugger. That will allow you to step through the matching process and see exactly what is going on.

    — Ken

Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Bishop) on Sep 06, 2021 at 06:31 UTC

    A note on usage. The tr/// operator does not have [...] character classes like regexes (as might be used in s///), but only ranges, e.g., a-z, so the \[ [ ] sequences in the tr/\[A-Z]/[a-z]/ expression in the OP represent literal [ ] characters (see tr/// in Quote-Like Operators in perlop).

    So '[' is being translated to '[' and ']' to ']'. This does no harm (indeed, the tr/// compiler may optimize this away), but suggests a misunderstanding of tr///.

    And since the tr/// is just translating upper case to lower, a simple
        $_ = lc;
    statement might be clearer, hence better (see lc).

    Update: I see now that Marshall has already made this point at the end of this post.


    Give a man a fish:  <%-{-{-{-<

Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Bishop) on Sep 07, 2021 at 11:21 UTC

    I'm sure there's a better Perl scrabble cheater already in existence, but I enjoyed the exercise. The "requirement" I sought to satisfy is that given a word already on the scrabble board and a tray of letters one can use to augment the word, return a list of all words from a standard dictionary that can be formed from the given word using only one or more letters from the tray added to the beginning and/or end of the given word.

    This code is minimally tested, but looks good to me. It also seems a good candidate for encapsulation in a module with a thorough test file. There are also a few enhancements one can imagine, e.g., listing the candidate words found along with their scrabble scores and in order sorted by score/alpha. I think this code will work under Perl version 5.8.9, but I haven't tested it with this version. The dictionary I used for testing is not an "official" scrabble dictionary, but a general dictionary I had on hand.

    scrabble_cheater_3.pl:

    Output:
    Win8 Strawberry 5.30.3.1 (64) Tue 09/07/2021 6:24:31 C:\@Work\Perl\monks\Marshall >perl scrabble_cheater_3.pl given: 'led' tray: 'ooidgkle' deLED doiLED doLED dolLED gelLED gilLED gLED gLEDe idLED kilLED LEDe LEDge LEDged LEDol ogLED oiLED Win8 Strawberry 5.30.3.1 (64) Tue 09/07/2021 7:22:34 C:\@Work\Perl\monks\Marshall >perl scrabble_cheater_3.pl -g no -t wk given: 'no' tray: 'wk' kNOw NOw


    Give a man a fish:  <%-{-{-{-<

Re: Problem with regex wildcard operator (.)...?
by LanX (Sage) on Sep 05, 2021 at 20:26 UTC
    > As you can see the wildcard behaves erratically.

    Sorry I don't see your problem, could you please describe it better and provide a SSCCE ?

    > Will you humor me one last time?

    We are happy to help, as long as you don't make it unnecessary difficult for us. :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Certainly. I realize it's probably tldr. Let me give it another go before you waste your time...

        OK, I shortened it, hope this is more acceptable. In the process I discovered, as I suspected, that it has something to do with the regex order, but exactly what is still pretty opaque to me.

        #!/usr/bin/perl my @words = ('aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so'); print "WORDS: @words\n"; print "----------------------------\n"; my $regex = ".?a?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; my $regex = "a?.?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; $regex = "s?.?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; $regex = ".?s?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n";

        OUTPUT

        WORDS: aa as es is os sh si so ---------------------------- Matches for regex pattern : .?a? aa ---------------------------- Matches for regex pattern : a?.? aa as ---------------------------- Matches for regex pattern : s?.? ---------------------------- Matches for regex pattern : .?s? as es is os sh si so ----------------------------

        Regards.

Re: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Bishop) on Sep 06, 2021 at 06:06 UTC

    BTW: If you are going to do matching against a word list after the words in the list have had their letters sorted into alpha order and if you will do many matches against the same word list, it will be (possibly much) more efficient to first build a hash of all sorted versus native words.

    Win8 Strawberry 5.8.9.5 (32) Mon 09/06/2021 1:53:18 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings use Data::Dump qw(dd); # for debug my @words = ( # added a few extra 'words' '', 'x', 'a', 's', 'aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so' ); dd '@words:', \@words; # for debug # map words to sorted words. my %sorted = map { $_ => join('', sort split //) } @words; dd 'sorted words hash:', \%sorted; # for debug print "----------------------------\n"; for my $regex (qw(.?a? a?.? s?.? .?s?)) { my $rx_full = qr{ ^ $regex $ }x; # faster if used often # my $rx_full = qr{ \A $regex \z }xms; # i prefer \A \z \Z and /xms + tail print "Matches for FULL regex pattern: $rx_full \n"; for my $word (@words) { print "'$sorted{$word}' -> '$word' " if $sorted{$word} =~ $rx +_full; } print "\n----------------------------\n"; } ^Z ( "\@words:", ["", "x", "a", "s", "aa", "as", "es", "is", "os", "sh", "si", "so"], ) ( "sorted words hash:", { "" => "", "a" => "a", "aa" => "aa", "as" => "as", "es" => "es", "is" => "is", "os" => "os", "s" => "s", "sh" => "hs", "si" => "is", "so" => "os", "x" => "x", }, ) ---------------------------- Matches for FULL regex pattern: (?x-ism: ^ .?a? $ ) '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa' ---------------------------- Matches for FULL regex pattern: (?x-ism: ^ a?.? $ ) '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa' 'as' -> 'a +s' ---------------------------- Matches for FULL regex pattern: (?x-ism: ^ s?.? $ ) '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' ---------------------------- Matches for FULL regex pattern: (?x-ism: ^ .?s? $ ) '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'as' -> 'as' 'es' -> 'e +s' 'is' -> 'is' 'os' -> 'os' 'hs' -> 'sh' 'is' -> 'si' 'os' -> ' +so' ----------------------------


    Give a man a fish:  <%-{-{-{-<

Re: Problem with regex wildcard operator (.)...?
by karlgoethebier (Abbot) on Sep 06, 2021 at 10:55 UTC
    «…cheat at Scrabble…»

    See Games::Literati. I don’t know if it’s good but it seems like it is maintained. Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Problem with regex wildcard operator (.)...?
by BillKSmith (Monsignor) on Sep 06, 2021 at 03:37 UTC
    Your misunderstanding has nothing to do with the metacharacter '.', but rather with the quantifier '?'. When a character (or metacharacter) in your regex matches zero times, you will reach the end of the regex before you have matched all the characters in the string. Your 'end-of-string' anchor '$' fails.
    Bill
Re: Problem with regex wildcard operator (.)...?
by Marshall (Canon) on Sep 06, 2021 at 07:25 UTC
    I really don't know the requirements for your application.
    I have taken an imperfect attempt at it below.

    I did come close to your output for "s." but not exactly.
    My output for "a." is wildly off from what you show.

    I downloaded your DB file to a local file to make my testing faster.
    I don't think that makes any difference.

    I would like more text to describe what you want to have happen.

    #!/usr/bin/perl use strict; use warnings; my @words; open my $fh, '<', 'EuroScrabbleWordList.txt' or die "can't open word l +ist $!"; foreach (<$fh>) { tr/\r\n//d; tr/A-Z/a-z/; next if /\s/; # dirty way of removing comments push @words, $_; } close $fh; ############################################ my $tiles = 'a.'; # print all words with 2 letters that contain "a" print "\nMATCHES FOR: $tiles\n"; my @m = find_matches($tiles); print "\n@m\n"; $tiles = 's.'; # print all words with 2 letters that contain "s" print "\nMATCHES FOR: $tiles\n"; @m = find_matches($tiles); print "\n@m\n"; $tiles = 'a.a'; #print all words with letters that contain 2 a's print "\nMATCHES FOR: $tiles\n"; @m = find_matches($tiles); print "\n@m\n"; ############################################ sub find_matches { my $pattern = shift; my $max_chars = length $pattern; my @matches; $pattern =~ s/\W+//g; #delete the dots my $raw_letters = $pattern; my $regex = ""; $regex .= "[$raw_letters].?" for (1..length $raw_letters); print "$regex\n"; foreach my $word (@words) { next if (length ($word) > $max_chars); push (@matches, $word) if $word =~ /$regex/; } return @matches; } __END__ MATCHES FOR: a. [a].? aa ab ad ae ag ah ai al am an ar as at aw ax ay ba da ea fa ha ja ka l +a ma na pa ta ya za MATCHES FOR: s. [s].? as es is os sh si so st us MATCHES FOR: a.a [aa].?[aa].? aa aah aal aas aba aga aha aia aka ala ama ana aua ava awa baa caa faa + maa
    UPDATE:
    I don't claim that my code is a general solution to your problem.
    In fact, I think that many enhancements are necessary! This was just minimal code to answer some simple questions.

    In terms of algorithms, let's start with something simple:
    for "s.":

    You say: as es is os sh si so I say: as es is os sh si so st us
    I am completely unable to understand why st and us should be missing from the output? Please explain.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136476]
Approved by LanX
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2022-05-27 19:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (97 votes). Check out past polls.

    Notices?