Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Pluggable regex engine in Perl

by szabgab (Priest)
on Dec 27, 2010 at 08:02 UTC ( #879264=perlmeditation: print w/replies, xml ) Need Help??

Since 5.10 people can replace the regex engine of Perl by plugging in some other regex engine. ( See the perlreapi for details).

I found several implementations on CPAN:
re::engine::PCRE seems to be the first one others use as the base, then there is re::engine::RE2 which is "Rather under development", re::engine::LPEG which is declared as a failure and re::engine::Oniguruma.

Are there other implementations?

I wonder if any of these are in use and what are the use cases? What issues do these plugs solve that is impossible, harder, slower or have some other issues with the standard regex engine of Perl?

If any of the authors read this, I wonder if you had any other motivation than just having fun with this and if you have any conclusion to draw?

Replies are listed 'Best First'.
Re: Pluggable regex engine in Perl
by ELISHEVA (Prior) on Dec 27, 2010 at 08:48 UTC

    I'm sure there are people who do take this as an invitation to write their own "just cause", but the practical reason for a custom regex engine would be the need to emulate the behavior of some other regex syntax. Bash, PHP, Java, various flavors of the grep command, all have their own slight regular expression nuances.

    For example, a person might need to port a large amount of code from PHP to Perl. Rather that study each and every regex in that code, it might make more sense to port the code but leave the original regular expressions in place. Converting syntax is usually fairly straight-forward. Deciphering and converting regular expressions, not always so. You would have that option with a pluggable regex API.

Re: Pluggable regex engine in Perl
by moritz (Cardinal) on Dec 27, 2010 at 09:27 UTC

    I've heard that the regex API has changed quite a bit between 5.10 and 5.12 due to the promotion of regexes to first class objects. If that's true, one should either just target 5.12 or newer when writing a new plugin, or be aware of the differences and use some #ifdefs.

    I wonder if any of these are in use and what are the use cases? What issues do these plugs solve that is impossible, harder, slower or have some other issues with the standard regex engine of Perl?

    The now deceased re::engine::TRE had two features that made it attractive for some uses: for one it would match the longest of several alternations (instead of the first, as Perl does), and secondly it uses a non-backtracking state machine internally whenever possible, which means that pathological exponential time behavior doesn't occur as easily as with the Perl engine.

Re: Pluggable regex engine in Perl
by dgl (Novice) on Dec 27, 2010 at 12:16 UTC

    I'm the author of re::engine::RE2.

    As for motivation it was mostly to learn a bit about this area of Perl, however I do see uses for RE2 due to its matching being much faster than Perl's matching.

    For example combined with an mmaped scalar I can match a regexp on 1 GiB of text in about 10 seconds (on a core 2 duo), Perl's RE doesn't even come close to that. You can see how Google's Code search can be so fast.

    There's some issues with Perl's UTF-8 handling (frankly it's insane), but once I've worked around that re::engine::RE2 should be nearly a drop in replacement for Perl's RE, but faster.

      There's some issues with Perl's UTF-8 handling (frankly it's insane),

      It could be informative to read some expansion on that position.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Pluggable regex engine in Perl
by Khen1950fx (Canon) on Dec 27, 2010 at 09:06 UTC
    Why not write your own engine that uses perl? You could get started by reading re::engine::Plugin to write the engine. It's still in its early stages but workable.
Re: Pluggable regex engine in Perl
by JavaFan (Canon) on Dec 27, 2010 at 15:56 UTC
    I haven't written a plugin, but I don't find it hard to come up with reasons to write plugins:
    • You want a different syntax (perhaps to have identical syntax as language X).
    • You want different rules (for instance, POSIX's "longest match" preference over Perl's "first match")
    • You want an engine that's optimized for certain cases. Perhaps you want a pure DFA - sacrifizing functionality for speed.
    • You may want to do matching in a particular encoding.
    I've no idea whether any of the plugins have been written with those reasons in mind, but I wouldn't be surprised if someone at sometime does.
Re: Pluggable regex engine in Perl
by AnomalousMonk (Chancellor) on Dec 27, 2010 at 16:57 UTC

    I have read that the TCL regex engine is a mixture of DFA and NFA: DFA is used when possible, NFA otherwise. Any comments on the truth of this notion, how appropriate this approach is, its performance vis-a-vis Perl's standard regex engine, and whether it might be available as a plug-in?

      From what I've read (which is dated by many years) TCL uses a DFA engine to test whether a match exists, then if it does and capturing substrings are needed, it uses an NFA engine to capture them.

      But this is a pretty good explanation of an alternate way that a regular expression engine could work and blend the advantages of a DFA and NFA engine.

Re: Pluggable regex engine in Perl
by Anonymous Monk on Dec 28, 2010 at 04:05 UTC

    Using this, Devel-Declare can one create a Perl 6 Grammar equivalent kind of a thing?

    Just asking as such a thing will be of very great help.

Re: Pluggable regex engine in Perl
by Anonymous Monk on Dec 28, 2010 at 15:57 UTC
    Btw, it would be interesting to be able to use this from Perl:

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://879264]
Approved by Old_Gray_Bear
[Corion]: Aaah - you should be able to do this with overload, but I would hit somebody really hard if they constructed objects that are true but the empty string, and you not knowing about the domain knowledge where this makes sense
[Eily]: you could tie a variable into not having the same value each time, if you like to make people who try to debug your code facepalm
[Corion]: perl -wle 'package o; use overload q("") => sub {warn "str"; ""}, bool => sub{warn "bool"; 1}; package main; my $o={}; bless $o => o; print "Yay" if ($o && !length($o))'
[Corion]: But people writing such code should document the objects they construct and why it makes sense for an object to be invisible as string while being true in a boolean context
[hippo]: That's equal parts clever and horrendous.
[Eily]: the overload version wouldn't return true with "$x" && !length $x though, I guess
[hippo]: The more I look at this code, the more $x is a plain old scalar and the more this condition will never be true. I'm calling it a bug at this point.
[hippo]: Thanks for your input which has soothed my sanity (a little)
[Corion]: Eily: Sure - if you force both things into stringy things, then you break that magic. But that would also mean that you changed the expression, as now $x = 0.00 will be true instead of false as it were before
[Corion]: Ah no, at least in my feeble experiments that doesn't change the meaning

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (14)
As of 2017-07-27 13:38 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (413 votes). Check out past polls.