How can I use backrefs in a lookbehind?

Contributed by Roy Johnson on Dec 02, 2004 at 15:25 UTC
Q&A  > regular expressions


I want to use a regex to capture 3-6 characters not containing a run of three of the same character. So, given AAABCDE, it would match AABCDE; and given ABCDDD, it would match ABCDD.

The most natural solution is to use lookbehind, starting with the third character, to check that the last three characters are not all the same:

The problem with that is that Perl's regex engine assumes that any backreference is variable-length, and variable-length lookbehinds are not allowed.

contributed by Roy Johnson

Use lookbehind to count back as many chars as you want, and at the front of it, put a lookahead to check your pattern:

/.. # match first two chars (?:(.) # capture next char, then (?<= # looking behind, (?!\1\1\1) # don't allow a run of three ...) # starting three chars back ){1,4}/x
This technique can also overcome some other variable-length lookbehind situations. For example, if you want to match "bar" that is preceded by "foo" somewhere in the preceding six characters:
/(?<= # looking behind, (?=.{0,3}foo) # look for a foo preceded by up to three chars .{6}) # starting six chars back bar/x # then match bar
The thing to remember is that the lookahead can see farther than the end of the lookbehind, so you need to explicitly limit it. You could use that feature to get a slightly different solution to the first problem:
/.. # match first two chars (?: (?<= # looking behind, (?!(.)\1\1) # don't allow a run of three ..) # starting only two chars back . # then match the next char ){1,4}/x
contributed by Ieronim

Only a small remark:
The idea of variable-length lookbehind is very good, but the given problem can be solved even without using lookbehinds at all:

#!/usr/bin/perl use warnings; use strict; my $pat = qr{ ( # 1: capture the whole substring (?: (.) # a character (?!\2\2) # NOT repeated three times ){1,4} # one to four of such 'good' characters .. # two any characters more; 2+4 = 6 ) }x; foreach (qw/AABBCCDD AABBBCCD AAABBCCD AABBCCCD AAABCDE/) { print "$1\n" if /$pat/; }

