Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

How can I use backrefs in a lookbehind?

( #411790=categorized question: print w/ replies, xml ) Need Help??
Contributed by Roy Johnson on Dec 02, 2004 at 15:25 UTC
Q&A  > regular expressions


Description:

I want to use a regex to capture 3-6 characters not containing a run of three of the same character. So, given AAABCDE, it would match AABCDE; and given ABCDDD, it would match ABCDD.

The most natural solution is to use lookbehind, starting with the third character, to check that the last three characters are not all the same:

/..(?:(.)(?<!\1\1\1)){1,4}/
The problem with that is that Perl's regex engine assumes that any backreference is variable-length, and variable-length lookbehinds are not allowed.

Answer: How can I use backrefs in a lookbehind?
contributed by Roy Johnson

Use lookbehind to count back as many chars as you want, and at the front of it, put a lookahead to check your pattern:

/.. # match first two chars (?:(.) # capture next char, then (?<= # looking behind, (?!\1\1\1) # don't allow a run of three ...) # starting three chars back ){1,4}/x
This technique can also overcome some other variable-length lookbehind situations. For example, if you want to match "bar" that is preceded by "foo" somewhere in the preceding six characters:
/(?<= # looking behind, (?=.{0,3}foo) # look for a foo preceded by up to three chars .{6}) # starting six chars back bar/x # then match bar
The thing to remember is that the lookahead can see farther than the end of the lookbehind, so you need to explicitly limit it. You could use that feature to get a slightly different solution to the first problem:
/.. # match first two chars (?: (?<= # looking behind, (?!(.)\1\1) # don't allow a run of three ..) # starting only two chars back . # then match the next char ){1,4}/x
Answer: How can I use backrefs in a lookbehind?
contributed by Ieronim

Only a small remark:
The idea of variable-length lookbehind is very good, but the given problem can be solved even without using lookbehinds at all:

#!/usr/bin/perl use warnings; use strict; my $pat = qr{ ( # 1: capture the whole substring (?: (.) # a character (?!\2\2) # NOT repeated three times ){1,4} # one to four of such 'good' characters .. # two any characters more; 2+4 = 6 ) }x; foreach (qw/AABBCCDD AABBBCCD AAABBCCD AABBCCCD AAABCDE/) { print "$1\n" if /$pat/; }
outputs
AABBCC AABB AABBCC AABBCC AABCDE

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (9)
    As of 2014-12-29 14:52 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (191 votes), past polls