Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Complex regular subexpression recursion limit

by joerg.ludwig (Initiate)
on Dec 03, 2009 at 15:47 UTC ( #810857=perlquestion: print w/ replies, xml ) Need Help??
joerg.ludwig has asked for the wisdom of the Perl Monks concerning the following question:

The regex to match a double-quoted string from the perl doc (http://perldoc.perl.org/perlre.html#Quantifiers) does not work for long strings:

# perl -we '(q(").(q(\a)x50000).q(")) =~ /"(?:[^"\\]++|\\.)*+"/' Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1.

How can this regex be rewritten to support strings of arbitrary length?

Thx in advance. :)

Comment on Complex regular subexpression recursion limit
Download Code
Replies are listed 'Best First'.
Re: Complex regular subexpression recursion limit
by ikegami (Pope) on Dec 03, 2009 at 16:35 UTC
    I'm pretty sure this bug has been fixed in upcoming 5.12. Till then.
    / " (?: (?: [^"\\]++ | \\. ){0,32766}+ ){0,32766}+ " /xs

    Update: Simplified code.

      What's the bug? The presence of a recursion limit doesn't seem by itself to be a bug—we've long (always?) had it for subroutines (Deep recursion on subroutine "%s"), and quantifiers {n,m} have also been limited (Quantifiers). It seems that both of these are intentional, trading some power for efficiency and safety. Is there something else that I'm missing?

      UPDATE: Obviously I didn't read my own link very well. I forgot that the Deep recursion message was a warning, not a fatal error.

        First of all, there is no limit on recursion depth. There's a suppressible warning when you attain a certain depth, that's all.

        Secondly, there is no efficiently gained by limiting the number of iterations. We're talking about using a 32-bit variable instead of a 16-bit variable on a 32-bit system.

        And it's not just a theoretical bug. There exists desire for the ability to match longer strings.

        It's a bug because the regexp engine isn't recursive anymore. Limiting quantifier size for "efficiency and safety" reasons makes as much sense stopping the following loop:
        for (1 .. 34000) { print $_ }
        after it printed 32766.

        Note that the "deep recursion" you are referring to is a warning, perl won't stop the recursion. But the Complex regular subexpression recursion limit makes perl just say "oh well, I had enough - I'll just pretend it doesn't match". That's wrong. It may even be exploitable.

      I spend some time trying to find my mistake while solving that problem:

      I have a text block, which consists of two different types of lines: 1) lines which starts with /[^#\n]*#/, 2) any other lines.

      Then I need to glue all consecutive lines of second type. In other words I need to delete newlines between them.

      Below it is a simplified problem, and me trying to solve it:

      use warnings; use strict; my $m = (1 << 15) + 1; # 32769 my $_="#\n" . "A\n" x $m . "#\n"; s/ ^[^#].*?$ # find a line which haven't "#" at it beginning. (?:\n[^#].*?$)* # find consecutive n (n>=0) such lines / $a=$&, $a=~y{\n}{}d, # delete newlines $a /megx ; ## multiline, eval, global, comments my $n = () = /\n/sg; # calc the number of occurences of "\n" print $n # ANSWER: 4, My expectation was: 3
      Perl v5.16 STDERR says "Complex regular subexpression recursion limit (32766)".

      My questions are: 1) can't we still change this limit by our hands? (Seems that Python allows so). UPD: 2) shouldn't it be an error if the answer which I gain is incorrect, and a knowledge about these limits are specific in my opinion, and me using "strict"?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://810857]
Approved by Corion
Front-paged by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2015-09-01 22:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred temperature scale is:










    Results (50 votes), past polls