http://www.perlmonks.org?node_id=747135

larryk has asked for the wisdom of the Perl Monks concerning the following question:

If I want to match a four letter word that starts and ends with the same character, I use: "test" =~ /^(.)..\1$/ - this is successful.

Now I want to match a four letter word that does not start and finish with the same letter, using: "test" =~ /^(.)..[^\1]$/ - this doesn't work; it's still a successful match.

It seems the backreference \1 is turning into an escaped literal within the character class.

P.S. ultimately, I'm trying to create a pattern that is able to say, "at some character position there either is or is not one of the preceeding characters," e.g."test" =~ /^(.)([^\1])([^\1\2])\1$/

   larryk                                          
perl -le "s,,reverse killer,e,y,rifle,lycra,,print"

Replies are listed 'Best First'.
Re: How to use a negative backreference in regex?
by zwon (Abbot) on Feb 28, 2009 at 12:57 UTC
    test =~ /^(.)..(?!\1).$/

      Indeed. And the reason [^\1] doesn't work is... well because within a character class \1 means 1, not the contents of the capture buffer 1. Which is a long way of saying it doesn't work because it doesn't work (however plausible it may look).

      thanks - it's been a while :)

      I eventually ended up with the following:

      my $unique = qr/^(.)(?!\1) (.)(?!\1|\2) (.)(?!\1|\2|\3) .$/x;
         larryk                                          
      perl -le "s,,reverse killer,e,y,rifle,lycra,,print"
      
Re: How to use a negative backreference in regex?
by missingthepoint (Friar) on Feb 28, 2009 at 13:03 UTC
Re: How to use a negative backreference in regex?
by johngg (Canon) on Feb 28, 2009 at 13:06 UTC

    This seems to work, using a negative look-ahead.

    use strict; use warnings; my @words = qw{ test tesi stoat trout }; my $regex = qr {(?x) ^ (.) .*? (?!\1) . $ }; print m{$regex} ? qq{ Matched: $_\n} : qq{Not matched: $_\n} for @words;

    Prints

    Not matched: test Matched: tesi Matched: stoat Not matched: trout

    I hope this is helpful.

    Cheers,

    JohnGG

Re: How to use a negative backreference in regex?
by jettero (Monsignor) on Feb 28, 2009 at 13:18 UTC
    I'm just guessing, but it sounds like you want something other than regular expressions. It feels like eventually you'd like to support nested (un)balanced expressions -- which I don't think you'll get to work with REs. Maybe Text::Balanced or something like it would help.

    LaTeX: ${a^nb^n\inL(G)}$

    -Paul

Re: How to use a negative backreference in regex?
by shmem (Chancellor) on Mar 02, 2009 at 23:03 UTC
    P.S. ultimately, I'm trying to create a pattern that is able to say, "at some character position there either is or is not one of the preceeding characters," e.g. "test" =~ /^(.)([^\1])([^\1\2])\1$/

    The following constructs such regexes. It takes a character sequence, and a file as input, and outputs the constructed pattern, then all the matching words in the file. Giving the characters numbers, these vertically aligned pairs match

    otto letter character 1221 123324 123431564
    so the constructed regexp for character would be
    ((\w)(?!\2)(\w)(?!\3|\2)(\w)(?!\4|\3|\2)(\w)\4\2(?!\5|\4|\3|\2)(\w)(?! +\6|\5|\4|\3|\2)(\w)\5)

    Note that the back-references start with 2 because of the outer parens, which enclose $1 (or \1 inside the regexp).

    #!/usr/bin/perl # match.pl use strict; my ($pat, $file) = @ARGV; my $p; { my (%s, %i); my $d = my $c = 1; # our regexp will be inside parens, so first back +ref is 2 $p = join ( "", map { if($s{$_}++){ "\\".$i{$_} } else{ $i{$_}=++$c; $c>$d+1 ? '(?!'.join('|',map{"\\".abs}-$c+1..-$d-1).")(\\w)" : + "(\\w)"; } } split//,$pat ); } print '(',$p,")\n"; open my $fh, '<', $file; my %s; while (<$fh>) { my @l = (); while (/\b($p)\b/g) { push @l, $1 unless $s{$1}++; } print join (", ",@l), $/ if @l; }

    Try match.pl fusselkerl /usr/share/dict/words.

    update: how would you specify a sequence to match a word composed of 15 different characters, which is 15 characters long? right: "dermatoglyphics". Or "1234567890abcde".

    ;-)