Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Stop runaway regex

by davido (Cardinal)
on May 30, 2014 at 04:46 UTC ( [id://1087936]=note: print w/replies, xml ) Need Help??


in reply to Stop runaway regex

This question was also posted on StackOverflow, and I took some time answering it there based on experience I acquired when an application I wrote needed to accept user regexes, and needed to not succumb to DOS attacks. Here's a recap:

  • alarm is inadequate; it cannot interrupt the running regexp engine.

  • Sys::SigAction provides a function called timeout_call, which is capable of interrupting the regexp engine while it's running. However, the RE engine was not designed for this possibility. It can be left in an unstable state, which can (and often enough will) lead to segfaults (tested on various versions of Perl). This is usually undesirable. Presumably POSIX::SigAction will share the same weakness, as the weakness is really the fact that the RE engine isn't designed to be interrupted.

  • If your regular expression will work with the RE2 engine, you are in luck, because it guarantees linear-time searches. There is a CPAN module that interfaces with the RE2 engine, as a drop-in replacement for Perl's engine: re::engine::RE2. Here's the big catch though: The "linear-time" guarantee comes at the cost of many of the powerful regex semantics we've come to expect with Perl's elaborate RE engine. For example, RE2 has no backreferences, nor zero-width assertions. If you need those, this won't work for you. But if you can live with its limitations, it is a fantastic option (assuming you've got a recent enough Perl to use it).

  • The best solution that provides the full semantic power of Perl's regular expressions, while also providing the ability to safely time out, is the fork/alarm/wait idiom. Fork a worker, set an alarm, wait, and if the alarm expires, shut down the worker. No need to worry about the process becoming unstable; you're done with it anyway.


Dave

Replies are listed 'Best First'.
Re^2: Stop runaway regex # CROSSPOST
by LanX (Saint) on May 30, 2014 at 15:15 UTC

      Done. (I added a mention to the bottom of my response to the SO question.)


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1087936]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-25 23:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found