Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Pluggable regex engine in Perl

by szabgab (Priest)
on Dec 27, 2010 at 08:02 UTC ( #879264=perlmeditation: print w/ replies, xml ) Need Help??

Since 5.10 people can replace the regex engine of Perl by plugging in some other regex engine. ( See the perlreapi for details).

I found several implementations on CPAN:
re::engine::PCRE seems to be the first one others use as the base, then there is re::engine::RE2 which is "Rather under development", re::engine::LPEG which is declared as a failure and re::engine::Oniguruma.

Are there other implementations?

I wonder if any of these are in use and what are the use cases? What issues do these plugs solve that is impossible, harder, slower or have some other issues with the standard regex engine of Perl?

If any of the authors read this, I wonder if you had any other motivation than just having fun with this and if you have any conclusion to draw?

Comment on Pluggable regex engine in Perl
Re: Pluggable regex engine in Perl
by ELISHEVA (Prior) on Dec 27, 2010 at 08:48 UTC

    I'm sure there are people who do take this as an invitation to write their own "just cause", but the practical reason for a custom regex engine would be the need to emulate the behavior of some other regex syntax. Bash, PHP, Java, various flavors of the grep command, all have their own slight regular expression nuances.

    For example, a person might need to port a large amount of code from PHP to Perl. Rather that study each and every regex in that code, it might make more sense to port the code but leave the original regular expressions in place. Converting syntax is usually fairly straight-forward. Deciphering and converting regular expressions, not always so. You would have that option with a pluggable regex API.

Re: Pluggable regex engine in Perl
by Khen1950fx (Canon) on Dec 27, 2010 at 09:06 UTC
    Why not write your own engine that uses perl? You could get started by reading re::engine::Plugin to write the engine. It's still in its early stages but workable.
Re: Pluggable regex engine in Perl
by moritz (Cardinal) on Dec 27, 2010 at 09:27 UTC

    I've heard that the regex API has changed quite a bit between 5.10 and 5.12 due to the promotion of regexes to first class objects. If that's true, one should either just target 5.12 or newer when writing a new plugin, or be aware of the differences and use some #ifdefs.

    I wonder if any of these are in use and what are the use cases? What issues do these plugs solve that is impossible, harder, slower or have some other issues with the standard regex engine of Perl?

    The now deceased re::engine::TRE had two features that made it attractive for some uses: for one it would match the longest of several alternations (instead of the first, as Perl does), and secondly it uses a non-backtracking state machine internally whenever possible, which means that pathological exponential time behavior doesn't occur as easily as with the Perl engine.

Re: Pluggable regex engine in Perl
by dgl (Novice) on Dec 27, 2010 at 12:16 UTC

    I'm the author of re::engine::RE2.

    As for motivation it was mostly to learn a bit about this area of Perl, however I do see uses for RE2 due to its matching being much faster than Perl's matching.

    For example combined with an mmaped scalar I can match a regexp on 1 GiB of text in about 10 seconds (on a core 2 duo), Perl's RE doesn't even come close to that. You can see how Google's Code search can be so fast.

    There's some issues with Perl's UTF-8 handling (frankly it's insane), but once I've worked around that re::engine::RE2 should be nearly a drop in replacement for Perl's RE, but faster.

      There's some issues with Perl's UTF-8 handling (frankly it's insane),

      It could be informative to read some expansion on that position.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Pluggable regex engine in Perl
by JavaFan (Canon) on Dec 27, 2010 at 15:56 UTC
    I haven't written a plugin, but I don't find it hard to come up with reasons to write plugins:
    • You want a different syntax (perhaps to have identical syntax as language X).
    • You want different rules (for instance, POSIX's "longest match" preference over Perl's "first match")
    • You want an engine that's optimized for certain cases. Perhaps you want a pure DFA - sacrifizing functionality for speed.
    • You may want to do matching in a particular encoding.
    I've no idea whether any of the plugins have been written with those reasons in mind, but I wouldn't be surprised if someone at sometime does.
Re: Pluggable regex engine in Perl
by AnomalousMonk (Abbot) on Dec 27, 2010 at 16:57 UTC

    I have read that the TCL regex engine is a mixture of DFA and NFA: DFA is used when possible, NFA otherwise. Any comments on the truth of this notion, how appropriate this approach is, its performance vis-a-vis Perl's standard regex engine, and whether it might be available as a plug-in?

      From what I've read (which is dated by many years) TCL uses a DFA engine to test whether a match exists, then if it does and capturing substrings are needed, it uses an NFA engine to capture them.

      But this is a pretty good explanation of an alternate way that a regular expression engine could work and blend the advantages of a DFA and NFA engine.

Re: Pluggable regex engine in Perl
by Anonymous Monk on Dec 28, 2010 at 04:05 UTC

    Using this, Devel-Declare can one create a Perl 6 Grammar equivalent kind of a thing?

    Just asking as such a thing will be of very great help.

Re: Pluggable regex engine in Perl
by Anonymous Monk on Dec 28, 2010 at 15:57 UTC
    Btw, it would be interesting to be able to use this from Perl: https://github.com/dprokoptsev/pire

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://879264]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2014-11-26 05:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (162 votes), past polls