Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

pat - find words by matching pattern (for crypto)

by merlyn (Sage)
on Aug 21, 2000 at 07:26 UTC ( #28764=sourcecode: print w/replies, xml ) Need Help??
Category: Cryptography
Author/Contact Info Randal L. Schwartz, merlyn
Description: Usage: pat ABCABC finds any word that has three repeated characters twice in a row (such as "murmur" in my dictionary). pat XYYX finds words that are four-character palindromes, such as "deed". In the result, X and Y must be different. So pat ABCDEFGHAB finds ten-letter words whose first two and last two characters are identical, but the remaining letters are all distinct, such as "thousandth" or "Englishmen".

To require literal characters, use lowercase, as in pat fXXd, requiring an f, two identical letters, and a d, such as "food" or "feed".

For grins, dumps the regex that the pattern has been transformed into, so you can write your own, or see how much work you're avoiding by using this program.

  "Fun for the entire family!" -- Rolling Stone magazine (but not about this program)

#!/usr/bin/perl -w
use strict;

open WORDS, "/usr/dict/words" or die "no more words: $!";

for (@ARGV) {
  my @avoid = do {
    my @lits = /[a-z]/g;
    @lits ? "[" . join("", @lits) . "]" : ()
  };
  my %template;
  my $regex = "^";
  for (split //) {
    if (/[a-z]/) {
      $regex .= "$_";
    } elsif (/[A-Z]/) {
      if (exists $template{$_}) {
        $regex .= $template{$_};
      } else {
        my $id = 1 + keys %template;
        if (@avoid) {
          $regex .= "(?!" . join("|", @avoid) . ")";
        }
        $regex .= "(.)";
        push @avoid, $template{$_} = "\\$id";
      }
    } else {
      warn "ignoring $_";
    }
  }
  $regex .= "\$";
  print "$_ => $regex\n";
  seek WORDS, 0, 0;
  while (<WORDS>) {
    next unless /$regex/i;
    print;
  }
}
Replies are listed 'Best First'.
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://28764]
help
Chatterbox?
[LanX]: not my code ...
[choroba]: yeah, sounds like one of the strings is not flagged as UTF-8
[choroba]: which usually means its input wasn't handled correctly
[Corion]: choroba: Yeah, I think that would be the good solution
[LanX]: I suspect the first string which comes from the DB ...
[LanX]: ... but this part is already in production for a year now
[Corion]: LanX: The "good" approach here would be to use the appropriate DBI parameters to make the driver decode strings properly. But that will have a ripple-on effect of messing up all the places where manual decoding happens ;)
[LanX]: which means albeit being broken UTF8 it'll be handled correctly
[LanX]: and the problem only occurs since we changed the emails to base64
[LanX]: my main problem will be to cnvince my colleagues that our productive code is broken oO ... so in the end I will just make a workaround :-/

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (11)
As of 2017-01-16 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (150 votes). Check out past polls.