Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Compare two regex patterns

by atemon (Chaplain)
on Mar 06, 2008 at 04:27 UTC ( #672354=perlquestion: print w/replies, xml ) Need Help??

atemon has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have two regular expressions.

$exp1 = "123*" $exp2 = "1233*"
$exp1 is stored in a MySQL table and user is entering $exp2 from a UI. We know that $exp1 will match all strings matched by $exp2. So I want to compare the two expressions. If I use any operator like eq or cmp I will get this as unequal. If I compare like  $exp1 =~ /^$exp2/ or $exp2 =~ /^$exp1/, it won't match as the meta characters like * has a different meaning on LHS and RHS of =~. Is there a way other than parsing the string to match them?

$exp1 is from DB and $exp2 is entered. Since $exp1 will match all expressions that would be matched by $exp2, I dont want to insert $exp2 into DB.

Cheers !

Gods Own Country...

Replies are listed 'Best First'.
Re: Compare two regex patterns
by moritz (Cardinal) on Mar 06, 2008 at 07:48 UTC
    You could try to compile both regexes with use re 'debug'; and compare those outputs:
    perl -c -Mre=debug -e 'm/123*/' Freeing REx: `","' Compiling REx `123*' size 6 Got 52 bytes for offset annotations. first at 1 rarest char 2 at 1 1: EXACT <12>(3) 3: STAR(6) 4: EXACT <3>(0) 6: END(0) ...

    The indented part is the interesting one. If two regexes show the same output in the indented part, they will match the same pattern (the reversion isn't always true, because (a|b) and (b|a) behave the same in some regexes, and behave differently in others (not quite sure, but you can construct cases with alternation where the order somtimes matter, and somtimes not)).

    I don't know if you can access this information from inside perl, perhaps with one of the B:: modules, if not you can still use backticks or qx to execute a command line like the one I showed.

      My Regexp::Compare does that - not for all regexps (that would solve the halting problem :-) ), but "1233*" is recognized as more specific than "123*". On the other hand, the implementation is quite dependent on Perl internals and currently broken for perl 5.10 - the public demand didn't justify an upgrade yet...
        not for all regexps (that would solve the halting problem :-) ),

        To digress a bit on theory: for regular expressions that isn't hard at all.

        Once you have a DFA for your regex, you can build the minimal DFA (that can be done in O(n) or O(n), not sure...). You can do that for both regexes you compare, and if the DFAs are isomorphic, both regexes are equivalent.

        Note that checking if two DFAs are isomorphic isn't hard either, because you know the start state.

        But of course Perl's regexps are not regular (in the CS sense), so you can forget everything I said if you're only after a practical solution.

Re: Compare two regex patterns
by quester (Vicar) on Mar 06, 2008 at 07:19 UTC
    Before writing a UI that accepts regular expressions, you should review the (?{code}) and (??{code}) patterns in perlre. You will want to remember this curious statement...
    use re 'eval'; # Open Pandora's box
    and make sure it is NOT being used*... otherwise your users may find some remarkably interesting patterns to type in.


    * (at least, not in the same section of code, and not without turning on taint.)

Re: Compare two regex patterns
by zer (Deacon) on Mar 06, 2008 at 06:27 UTC
    from the looks of it, a good solution may be to use Regexp::Optimizer on both values. Then see if they are equal after

      That doesn't work for two reasons.

      1. Regexp::Optimizer is a no-op if no alternation is used.

      2. Even if Regexp::Optimizer normalized regexps, the normalized regexps still wouldn't be equal since the two regexps are not equivalent.

      use Regexp::Optimizer qw( ); my $o = Regexp::Optimizer->new(); print $o->optimize(qr/123*/), "\n"; # (?-xism:123*) print $o->optimize(qr/1233*/), "\n"; # (?-xism:1233*) print $o->optimize(qr/123+/), "\n"; # (?-xism:123+)

      The OP didn't ask for a method to check if two regexp are equivalent, but for a method to check if a regexp will match at least everything a second regexp matches.

      Update: Added code. Changed formatting.

Re: Compare two regex patterns
by Bloodrage (Monk) on Mar 06, 2008 at 08:46 UTC

    Actually your regular expressions are:

    $exp1 = "123*"; $exp2 = "123+";

    * is "any number of" the preceding character, including zero.
    + is "one or more of" the preceding character.

    Hence: /x+/ == /xx*/

    Apart from the pedantry, I'm afraid don't have an answer for your question.

    no flames if this changed for 5.10, because that's the Perl I live in.

Re: Compare two regex patterns
by linxdev (Sexton) on Aug 27, 2014 at 01:31 UTC
    I'm resurrecting this thread to see if there are any new ideas. I'm in the same boat as the OP. In my case users create 100s of regular expressions, but sometimes they create dupes. Here are two such dupes.
    ^S.*$[\r\n] ^SYS.*$[\r\n]
    I'm trying to figure out a way to compare each regex created to see if there is a possible dupe. Whey the \r\n? I need it there for the Perl Expect module. The data we are matching against comes in slowly and without Expect will not wait till the \n for $. Any idea on how to test this would be appreciated. I've tried Regexp::Compare, Regexp::ERE, and the Regexp debugger.
    [root@host]# perl -Mre=debug -e '/12/'; Compiling REx "12" Final program: 1: EXACT <12> (3) 3: END (0) anchored "12" at 0 (checking anchored isall) minlen 2 Freeing REx: "12" [root@host]# perl -Mre=debug -e '/1+/'; Compiling REx "1+" Final program: 1: PLUS (4) 2: EXACT <1> (0) 4: END (0) anchored "1" at 0 (checking anchored) plus minlen 1 Freeing REx: "1+"

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://672354]
Approved by bobf
Front-paged by Tanktalus
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2022-09-30 13:50 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (126 votes). Check out past polls.