Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Regexp::Common not so common?

by doom (Deacon)
on Aug 14, 2008 at 18:28 UTC ( [id://704404]=note: print w/replies, xml ) Need Help??


in reply to Re: Regexp::Common not so common?
in thread Regexp::Common not so common?

"This way, since I'm not doing that, I don't allow anything else (that may be improperly untainted) to use the dangerous eval option while still getting the power of Abigail's Regexp::Common module (with evals)."

I think you're being much too polite, and probably unfairly blaming this insanity on Abigail, rather than the original author, Damien Conway.

In principle, the Regexp::Common module could be the simplest thing out on CPAN: a library of regexps that you request by name. Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter), and there's something strange about what it returns that I couldn't be bothered to figure out myself. When last I looked if you tried to peek at it with the "x" command in the debugger, the debugger would crash.

One of my rules of thumb is that a module that's too complicated to work with the debugger is too complicated to use in production. So to answer the question posed in the title: no, I don't think Regexp::Common is all that common. Programmers have quietly voted with their feet and walked away from using it.

Replies are listed 'Best First'.
Re^3: Regexp::Common not so common?
by chromatic (Archbishop) on Aug 14, 2008 at 19:37 UTC
    Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter)...

    The order of the keys doesn't matter in most hashes. Regexp::Common uses a tied hash to avoid compiling all of the possible regexps at compile time.

    ... and there's something strange about what it returns that I couldn't be bothered to figure out myself.

    A compiled regular expression? They've been around for most of a decade, if not longer.

    When last I looked if you tried to peek at it with the "x" command in the debugger, the debugger would crash.

    Having read some of the debugger's code, I'm not surprised. Did you file a bug?

      Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter)...
      The order of the keys doesn't matter in most hashes. Regexp::Common uses a tied hash to avoid compiling all of the possible regexps at compile time.

      The order of keys matters in hashes of hashes. (e.g. $$hash{first_key}{second_key} ne $$hash{second_key}{first_key}). Bizarre as it may be, though, the odd way of accessing things in Regexp::Common is pretty clever.

Re^3: Regexp::Common not so common?
by Tanktalus (Canon) on Aug 14, 2008 at 21:00 UTC
    I think you're being much too polite, and probably unfairly blaming this insanity on Abigail, rather than the original author, Damien Conway.

    I think you're misreading me. I prefer to have all my ugly hacks hidden behind nice, neat APIs. Regexp::Common provides a nice, neat API (though how "nice" or "neat" could be debated, but it's still an API). In this case, where we're using re 'eval', it also nicely partitions my tainted code away from evals. That is, I can use those "common" regular expressions (with all of their re-eval trickery), without exposing any of the rest of my code to possible injection attacks. This doesn't absolve me from proper untainting of my input, of course, it merely lowers the risk without reducing the power.

Re^3: Regexp::Common not so common?
by JavaFan (Canon) on Aug 16, 2008 at 10:52 UTC
    In principle, the Regexp::Common module could be the simplest thing out on CPAN: a library of regexps that you request by name.

    Yeah, but that it loses a lot of its power, doesn't? Currently, you can use the "balanced" regex with almost any delimiters you want. If all you could do was to request them by name, you'd need a thousand names to get balanced patterns with a thousand different delimiters, and if you would have the thousand-and-one delimiter, you're out of luck. What you want isn't much different from wanting subroutines that do not take arguments.

    Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter),

    The order partially matters. It matters for the part that defines the name, but it doesn't matter for the configuration. That's not uncommon for other APIs, where the order of the options doesn't matter, but it does matter for mandatory arguments.

    and there's something strange about what it returns that I couldn't be bothered to figure out myself.

    It returns an overloaded object. Which stringifies to a pattern.

    As for the use re 'eval', there's no way around it if you want to stick to pre 5.10. To do recursion in 5.8.x (or earlier), you need the (??{ }) construct. Which you can use without problems if it appears literally. But will trigger an exception if you interpolate it (the reason being that up to the point (?{ }) and (??{ }) where introduced, interpolating variables in a regex was "safe", it couldn't run Perl code. With the new constructs what was no longer true, so to protect older code, you had to use use re 'eval' if your interpolated variables contain such constructs).

    Now, if you don't trust the patterns from Regexp::Common, you shouldn't run them at all, because they will contain (??{ }) and (?{ }) constructs, and will execute Perl code when evaluating a pattern. Regardless whether you set use re 'eval' or not. You need use re 'eval' if you're going to interpolate pattern in a larger regexp, because Perl will first stringify the pattern (except in some trivial cases), and then, if they contain (??{ }) or (?{ }), you need the use re 'eval' or trip the safety mechanism.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://704404]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-19 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found