joerg.ludwig has asked for the wisdom of the Perl Monks concerning the following question:

The regex to match a double-quoted string from the perl doc (http://perldoc.perl.org/perlre.html#Quantifiers) does not work for long strings:

# perl -we '(q(").(q(\a)x50000).q(")) =~ /"(?:[^"\\]++|\\.)*+"/' Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1.

How can this regex be rewritten to support strings of arbitrary length?

Thx in advance. :)

Comment on Complex regular subexpression recursion limit
Download Code
Re: Complex regular subexpression recursion limit
by ikegami (Pope) on Dec 03, 2009 at 16:35 UTC
    I'm pretty sure this bug has been fixed in upcoming 5.12. Till then.
    / " (?: (?: [^"\\]++ | \\. ){0,32766}+ ){0,32766}+ " /xs

    Update: Simplified code.

      What's the bug? The presence of a recursion limit doesn't seem by itself to be a bug—we've long (always?) had it for subroutines (Deep recursion on subroutine "%s"), and quantifiers {n,m} have also been limited (Quantifiers). It seems that both of these are intentional, trading some power for efficiency and safety. Is there something else that I'm missing?

      UPDATE: Obviously I didn't read my own link very well. I forgot that the Deep recursion message was a warning, not a fatal error.

        It's a bug because the regexp engine isn't recursive anymore. Limiting quantifier size for "efficiency and safety" reasons makes as much sense stopping the following loop:
        for (1 .. 34000) { print $_ }
        after it printed 32766.

        Note that the "deep recursion" you are referring to is a warning, perl won't stop the recursion. But the Complex regular subexpression recursion limit makes perl just say "oh well, I had enough - I'll just pretend it doesn't match". That's wrong. It may even be exploitable.

        First of all, there is no limit on recursion depth. There's a suppressible warning when you attain a certain depth, that's all.

        Secondly, there is no efficiently gained by limiting the number of iterations. We're talking about using a 32-bit variable instead of a 16-bit variable on a 32-bit system.

        And it's not just a theoretical bug. There exists desire for the ability to match longer strings.