http://www.perlmonks.org?node_id=810857

joerg.ludwig has asked for the wisdom of the Perl Monks concerning the following question:

The regex to match a double-quoted string from the perl doc (http://perldoc.perl.org/perlre.html#Quantifiers) does not work for long strings:

# perl -we '(q(").(q(\a)x50000).q(")) =~ /"(?:[^"\\]++|\\.)*+"/' Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1.

How can this regex be rewritten to support strings of arbitrary length?

Thx in advance. :)

Replies are listed 'Best First'.
Re: Complex regular subexpression recursion limit
by ikegami (Patriarch) on Dec 03, 2009 at 16:35 UTC
    I'm pretty sure this bug has been fixed in upcoming 5.12. Till then.
    / " (?: (?: [^"\\]++ | \\. ){0,32766}+ ){0,32766}+ " /xs

    Update: Simplified code.

      What's the bug? The presence of a recursion limit doesn't seem by itself to be a bug—we've long (always?) had it for subroutines (Deep recursion on subroutine "%s"), and quantifiers {n,m} have also been limited (Quantifiers). It seems that both of these are intentional, trading some power for efficiency and safety. Is there something else that I'm missing?

      UPDATE: Obviously I didn't read my own link very well. I forgot that the Deep recursion message was a warning, not a fatal error.

        First of all, there is no limit on recursion depth. There's a suppressible warning when you attain a certain depth, that's all.

        Secondly, there is no efficiently gained by limiting the number of iterations. We're talking about using a 32-bit variable instead of a 16-bit variable on a 32-bit system.

        And it's not just a theoretical bug. There exists desire for the ability to match longer strings.

        It's a bug because the regexp engine isn't recursive anymore. Limiting quantifier size for "efficiency and safety" reasons makes as much sense stopping the following loop:
        for (1 .. 34000) { print $_ }
        after it printed 32766.

        Note that the "deep recursion" you are referring to is a warning, perl won't stop the recursion. But the Complex regular subexpression recursion limit makes perl just say "oh well, I had enough - I'll just pretend it doesn't match". That's wrong. It may even be exploitable.

      I spend some time trying to find my mistake while solving that problem:

      I have a text block, which consists of two different types of lines: 1) lines which starts with /[^#\n]*#/, 2) any other lines.

      Then I need to glue all consecutive lines of second type. In other words I need to delete newlines between them.

      Below it is a simplified problem, and me trying to solve it:

      use warnings; use strict; my $m = (1 << 15) + 1; # 32769 my $_="#\n" . "A\n" x $m . "#\n"; s/ ^[^#].*?$ # find a line which haven't "#" at it beginning. (?:\n[^#].*?$)* # find consecutive n (n>=0) such lines / $a=$&, $a=~y{\n}{}d, # delete newlines $a /megx ; ## multiline, eval, global, comments my $n = () = /\n/sg; # calc the number of occurences of "\n" print $n # ANSWER: 4, My expectation was: 3
      Perl v5.16 STDERR says "Complex regular subexpression recursion limit (32766)".

      My questions are: 1) can't we still change this limit by our hands? (Seems that Python allows so). UPD: 2) shouldn't it be an error if the answer which I gain is incorrect, and a knowledge about these limits are specific in my opinion, and me using "strict"?