http://www.perlmonks.org?node_id=1014816

didess has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks !

Do you know some "measure" of how strict a Perl regular expression is ?.

By this I mean the following :

For example, the string ' Daddy' matches /Dad+y/. It matches /D.+/ too.

I hope everyone agrees if I pretend "the first regex is stricter than the second one".

In the need I try to satisfy, I could introduce some limitation in the string's length.

I'm looking for some algorithm or hidden attribute or any good idea translating this "obviousness" into a number, or anything allowing classifying regexes with respect to their "stricness".

Thanks for the help !

Replies are listed 'Best First'.
Re: how to measure strictness of a regex ?
by tobyink (Canon) on Jan 23, 2013 at 00:30 UTC

    re::regmust may be of some help...

    use 5.014; use strict; use warnings; use re (); use List::Util 'sum'; sub re_strictness ($) { sum map length, re::regmust shift } say re_strictness qr/Dad+y/; # says 5 say re_strictness qr/D.+/; # says 1
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

      I, also, have some (admittedly, rather idle) curiosity: what is the purpose of the prototype in the  re_strictness() function definition?

        I think my original code when I was testing was something along the lines of:

        print re_strictness qr/.../, "\n";

        In which case the ($) prototype helps Perl see the "\n" as a parameter for print, not for re_strictness.

        By the way, it's worth pointing out that qr/^Dad+y$/ is also considered slightly stricter than qr/Dad+y/.

        use 5.014; use strict; use warnings; use re (); use List::Util 'sum'; sub re_strictness ($) { sum map length, re::regmust shift } print re_strictness qr/^Dad+y$/, "\n", re_strictness qr/Dad+y/, "\n", re_strictness qr/D.+/, "\n", ;
        package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: how to measure strictness of a regex ?
by davido (Cardinal) on Jan 23, 2013 at 00:29 UTC

    I don't have a solution. But I do have a curiosity: What problem are you solving with this?


    Dave

Re: how to measure strictness of a regex ?
by LanX (Saint) on Jan 23, 2013 at 07:39 UTC
    > I hope everyone agrees if I pretend "the first regex is stricter than the second one".

    No, your definition of strict is very fuzzy.

    > For example, the string ' Daddy' matches

    Are the strings given or could they come from a defined set? Like [a-zA-Z]{1,20} ? Is the set infinite?

    > anything allowing classifying regexes

    Please be aware that regexes are almost a complete sub-language with recursions and more. You don't expect us to solve the halting problem for the general case , don't you?

    So which restricted regex syntax is allowed?

    I have an idea for a clear definition of "strictness" and an approach to solve this by estimating the cardinality of possible solutions...

    But I'm reluctant to spend efforts w/o a clear problem description. Sorry!

    Like davido already asked, could you please clarify what for you're wanting this?

    Seems pretty much to be a XY Problem for me...

    Cheers Rolf