P.S. ultimately, I'm trying to create a pattern that is able to say, "at some character position there either is or is not one of the preceeding characters," e.g. "test" =~ /^(.)([^\1])([^\1\2])\1$/
The following constructs such regexes. It takes a character sequence, and a file as input, and outputs the constructed pattern, then all the matching words in the file. Giving the characters numbers, these vertically aligned pairs match
otto letter character
1221 123324 123431564
so the constructed regexp for character would be
((\w)(?!\2)(\w)(?!\3|\2)(\w)(?!\4|\3|\2)(\w)\4\2(?!\5|\4|\3|\2)(\w)(?!
+\6|\5|\4|\3|\2)(\w)\5)
Note that the back-references start with 2 because of the outer parens, which enclose $1 (or \1 inside the regexp).
#!/usr/bin/perl
# match.pl
use strict;
my ($pat, $file) = @ARGV;
my $p;
{
my (%s, %i);
my $d = my $c = 1; # our regexp will be inside parens, so first back
+ref is 2
$p = join (
"",
map {
if($s{$_}++){
"\\".$i{$_}
}
else{
$i{$_}=++$c;
$c>$d+1 ? '(?!'.join('|',map{"\\".abs}-$c+1..-$d-1).")(\\w)" :
+ "(\\w)";
}
} split//,$pat
);
}
print '(',$p,")\n";
open my $fh, '<', $file;
my %s;
while (<$fh>) {
my @l = ();
while (/\b($p)\b/g) {
push @l, $1 unless $s{$1}++;
}
print join (", ",@l), $/ if @l;
}
Try match.pl fusselkerl /usr/share/dict/words.
update: how would you specify a sequence to match a word composed of 15 different characters, which is 15 characters long? right: "dermatoglyphics". Or "1234567890abcde". ;-)
|