Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

How can I use backrefs in a lookbehind?

by Roy Johnson (Monsignor)
on Dec 02, 2004 at 15:25 UTC ( [id://411790]=perlquestion: print w/replies, xml ) Need Help??

Roy Johnson has asked for the wisdom of the Perl Monks concerning the following question: (regular expressions)

I want to use a regex to capture 3-6 characters not containing a run of three of the same character. So, given AAABCDE, it would match AABCDE; and given ABCDDD, it would match ABCDD.

The most natural solution is to use lookbehind, starting with the third character, to check that the last three characters are not all the same:

/..(?:(.)(?<!\1\1\1)){1,4}/
The problem with that is that Perl's regex engine assumes that any backreference is variable-length, and variable-length lookbehinds are not allowed.

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How can I use backrefs in a lookbehind?
by Roy Johnson (Monsignor) on Dec 02, 2004 at 16:00 UTC
    Use lookbehind to count back as many chars as you want, and at the front of it, put a lookahead to check your pattern:
    /.. # match first two chars (?:(.) # capture next char, then (?<= # looking behind, (?!\1\1\1) # don't allow a run of three ...) # starting three chars back ){1,4}/x
    This technique can also overcome some other variable-length lookbehind situations. For example, if you want to match "bar" that is preceded by "foo" somewhere in the preceding six characters:
    /(?<= # looking behind, (?=.{0,3}foo) # look for a foo preceded by up to three chars .{6}) # starting six chars back bar/x # then match bar
    The thing to remember is that the lookahead can see farther than the end of the lookbehind, so you need to explicitly limit it. You could use that feature to get a slightly different solution to the first problem:
    /.. # match first two chars (?: (?<= # looking behind, (?!(.)\1\1) # don't allow a run of three ..) # starting only two chars back . # then match the next char ){1,4}/x
Re: How can I use backrefs in a lookbehind?
by Ieronim (Friar) on Jun 28, 2006 at 21:51 UTC
    Only a small remark:
    The idea of variable-length lookbehind is very good, but the given problem can be solved even without using lookbehinds at all:
    #!/usr/bin/perl use warnings; use strict; my $pat = qr{ ( # 1: capture the whole substring (?: (.) # a character (?!\2\2) # NOT repeated three times ){1,4} # one to four of such 'good' characters .. # two any characters more; 2+4 = 6 ) }x; foreach (qw/AABBCCDD AABBBCCD AAABBCCD AABBCCCD AAABCDE/) { print "$1\n" if /$pat/; }
    outputs
    AABBCC AABB AABBCC AABBCC AABCDE

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://411790]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-19 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found