Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Strange behaviour of the regex engine (pathological protection)

by tye (Sage)
on Jan 06, 2015 at 03:16 UTC ( [id://1112291]=note: print w/replies, xml ) Need Help??


in reply to Strange behaviour of the regex engine

Perl optimizes certain nested loop regex constructs similar to (...*)* in order to avoid what is pathological behavior with most regex engines. For non-pathological cases, there is a small cost to this protection.

The reason the performance significantly decreases is because that crosses the point that triggers the warning "Complex regular subexpression recursion limit (32766) exceeded" (repeatedly).

- tye        

  • Comment on Re: Strange behaviour of the regex engine (pathological protection)

Replies are listed 'Best First'.
Re^2: Strange behaviour of the regex engine (pathological protection)
by thof (Initiate) on Jan 06, 2015 at 18:53 UTC

    Many thanks! I didn't know about warnings. But I noticed new issue. The problem is that when this warning appears the regex doesn't match... I changed line to print "not matched" if $TestString !~ m/^(a+|b)*$/; and for string 256000 and above regex is not able to match. Also when I use possessive quantifiers (m/^(a++|b)*+$/) performance improves a lot but still it does not match for strings larger than 256000.

    I don't understand regex engine in Perl... I also checked the same regexes for Python and PHP (PCRE) and they work similar to Java. I doubt that this is bug in Perl and I cannot match big input strings. There must be something I don't know.

      If I were you, I would file this (the failure case) as a bug against Perl. If nothing else, it might lead to p5p giving you better information than I have off the top of my head.

      Best case, this 32k limit might be removed (a limit that seemed more reasonable when it was implemented but seems increasingly small as time passes).

      - tye        

      Many thanks! I didn't know about warnings. But I noticed new issue. The problem is that when this warning appears the regex doesn't match...
      It's not a new issue it's the same issue :) The recursion depth is limited to prevent catastrophic backtracking (mkay I think it's not actually recursion but it's called that in the docs). Doesn't Friedl describe that in his book? Add use diagnostics; to your program.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112291]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-24 08:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found