Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Just another Perl shrine
 
PerlMonks  

Complex regular subexpression recursion limit

by joerg.ludwig (Initiate)
on Dec 03, 2009 at 15:47 UTC ( #810857=perlquestion: print w/ replies, xml ) Need Help??
joerg.ludwig has asked for the wisdom of the Perl Monks concerning the following question:

The regex to match a double-quoted string from the perl doc (http://perldoc.perl.org/perlre.html#Quantifiers) does not work for long strings:

# perl -we '(q(").(q(\a)x50000).q(")) =~ /"(?:[^"\\]++|\\.)*+"/' Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1.

How can this regex be rewritten to support strings of arbitrary length?

Thx in advance. :)

Comment on Complex regular subexpression recursion limit
Download Code
Re: Complex regular subexpression recursion limit
by ikegami (Pope) on Dec 03, 2009 at 16:35 UTC
    I'm pretty sure this bug has been fixed in upcoming 5.12. Till then.
    / " (?: (?: [^"\\]++ | \\. ){0,32766}+ ){0,32766}+ " /xs

    Update: Simplified code.

      What's the bug? The presence of a recursion limit doesn't seem by itself to be a bug—we've long (always?) had it for subroutines (Deep recursion on subroutine "%s"), and quantifiers {n,m} have also been limited (Quantifiers). It seems that both of these are intentional, trading some power for efficiency and safety. Is there something else that I'm missing?

      UPDATE: Obviously I didn't read my own link very well. I forgot that the Deep recursion message was a warning, not a fatal error.

        It's a bug because the regexp engine isn't recursive anymore. Limiting quantifier size for "efficiency and safety" reasons makes as much sense stopping the following loop:
        for (1 .. 34000) { print $_ }
        after it printed 32766.

        Note that the "deep recursion" you are referring to is a warning, perl won't stop the recursion. But the Complex regular subexpression recursion limit makes perl just say "oh well, I had enough - I'll just pretend it doesn't match". That's wrong. It may even be exploitable.

        First of all, there is no limit on recursion depth. There's a suppressible warning when you attain a certain depth, that's all.

        Secondly, there is no efficiently gained by limiting the number of iterations. We're talking about using a 32-bit variable instead of a 16-bit variable on a 32-bit system.

        And it's not just a theoretical bug. There exists desire for the ability to match longer strings.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://810857]
Approved by Corion
Front-paged by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2014-04-18 23:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (473 votes), past polls