http://www.perlmonks.org?node_id=971814

sunmaz has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to change my previously greedy quantification into a possessive modality.

My regex is: /(\d++\.\d++)($meanSearchTerm)/oi

Perl complains thusly: Nested quantifiers in regex; marked by <-- HERE in m/([\d]++ <-- HERE \.[\d]++)( bits = mean)/ at ScanGenome.pl line 326, <RIBL> line 1.

I would like to resolve this without further capturing (so no additional parens).

Any help would be much appreciated. Thank you.

Replies are listed 'Best First'.
Re: Perl Complains of Nested Quantifiers
by ww (Archbishop) on May 22, 2012 at 15:52 UTC

    Since you're seeking help, you'd better explain what you mean by "change my previously greedy quantification into a possessive modality" -- perhaps with a description of your intent that actually provides comprehensible information. And it wouldn't hurt to show a (small) sample of your data and your desired output (and possibly, explain why the last part of the code sample in para 1, ($meanSearchTerm) doesn't match the error message you posted.

    BTW, Perl 5.14 on win7 executes your sample (minus the ($meanSearchTerm) without squawking:

    C:>perl -e "my $foo='123.123';$foo =~/(\d++\.\d++)/oi; print $1;" 123.123 C:>

    Update/afterthought: I was actually well started writing an explanation of why that wouldn't compile when I decided to check. Once again, testing saves my butt.

      Please refer to my above reply. I hope that is sufficiently detailed.

      It is curious that it works on 5.14. It does not on the version I am running (5.8.8).

        I think "Possessive Quantifiers" in perl5100delta Regular-expressions explains the version dependency. I am on 5.12.2, and here is what re shows:
        $ perl -Mre=debug -e '/\d++/' Compiling REx "\d++" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: SUSPEND (7) 3: PLUS (5) 4: DIGIT (0) 5: SUCCEED (0) 6: TAIL (7) 7: END (0) stclass ANYOF[0-9][{unicode_all}] minlen 1 Freeing REx: "\d++" $ $ $ perl -Mre=debug -e '/\d+++/' Compiling REx "\d+++" Nested quantifiers in regex; marked by <-- HERE in m/\d+++ <-- HERE / +at -e line 1. $
Re: Perl Complains of Nested Quantifiers
by AnomalousMonk (Archbishop) on May 22, 2012 at 19:25 UTC

    The 'atomic' extended pattern  "(?>pattern)" (available in 5.8) will give you possessiveness around the entire pattern or any sub-pattern of your choosing. See Extended Patterns in perlre. (The following example doesn't really illustrate atomic/possessive matching; for a good discussion and pertinent examples, see the docs.)

    >perl -wMstrict -le "print qq{perl ver. $]}; my $rx = qr{ ((?> \d+)) }xms; 'abc1234def' =~ $rx; print qq{'$1'}; " perl ver. 5.008009 '1234'

      The Extended Patterns documentation mentions that "(?>pattern) does not disable backtracking altogether once it has matched." A concern was expressed about "catastrophic backtracking." I'm curious about the nature of the 'catastrophe,' and whether your solution would satisfactorily avert it.

        Well, what this means is that a pattern like /foo(?:baz)++baz/ will never match. If you had a string like "foobazbazbaz" it would fail because (?:baz)++ will "gobble up" all of the "baz" in the string, and then refuse to give anything back. What it means by "does not disable backtracking" is that "foobazbazbaz foobazbazbaz" will *attempt* the /(?:baz)++/ twice, once after each "foo". Neither will match of course, and if the RE was _really_ smart it would know it could never match and wouldn't try at all, but it isn't. :-)

        ---
        $world=~s/war/peace/g

        A concern was expressed about "catastrophic backtracking." I'm curious about ... whether your solution would satisfactorily avert it.

        It would not. AFAIU, possessive quantifiers like  ++ *+ ?+ {n,m}+ are just special, limited cases of the general  (?>...) atomic grouping. See example below.

        >perl -wMstrict -le "my $s = 'foobazbazbaz foobazbazbaz'; print 'match 1' if $s =~ m{ foo (?> (?:baz)+) baz }xms; print 'match 2' if $s =~ m{ foo (?> .* baz) baz }xms; "
        (no output)
Re: Perl Complains of Nested Quantifiers
by ikegami (Patriarch) on May 22, 2012 at 16:39 UTC

    You must be using a version of Perl before this feature was added (5.10 IIRC).

    If so, you're using an ancient version of Perl. 5.8, 5.10 and now 5.12 have all been end-of-lifed. 5.14 is still supported, and 5.16.0 is the latest stable version.

      Thanks. That clarifies things. Unfortunately, I am stuck using this ancient version due to various reasons.

      Thanks everyone for all of your assistance in this matter.

        FWIW the "possessive modifier" is simply syntactic sugar for (?>...) so (?>x+) should produce exactly the same optree as x++.

        $ perl -Mre=debug -e'/x++/' Compiling REx "x++" Final program: 1: SUSPEND (8) 3: PLUS (6) 4: EXACT <x> (0) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "x" at 0 (checking anchored) minlen 1 Freeing REx: "x++" $ perl -Mre=debug -e'/(?>x+)/' Compiling REx "(?>x+)" Final program: 1: SUSPEND (8) 3: PLUS (6) 4: EXACT <x> (0) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "x" at 0 (checking anchored) minlen 1 Freeing REx: "(?>x+)"

        so there is nothing stopping you using it in 5.8.8

        ---
        $world=~s/war/peace/g

Re: Perl Complains of Nested Quantifiers
by roboticus (Chancellor) on May 22, 2012 at 15:50 UTC

    sunmaz:

    Update: Obviously, I should've tested my theory before shooting off my mouth. ww gave it a try (nice catch!) and didn't see the problem, when I tried, I couldn't reproduce either.

    The problem is that you're asking for "one or more one or more digits". You either mean one or more digits, or one or more digits followed by a '+':

    /(\d+\.\d+)/oi; # One or more digits /(\d+\+\.\d+\+)/oi; # One or more digits with '+'

    Since you mention greediness, you might mean to not take too many items from the string. However, since you're taking only digits and have a literal '.' between them, I don't know what you'd really mean. (\d+? doesn't make any sense, unless it were something like \d+?7, for example.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I wish to match the same as the following regex would (you can ignore the scalar at the end of my pattern - it is just a substitution for a simple search string): /(\d+\.\d+)($meanSearchTerm)/oi. That is it matches strings of the form: floating point numbersearch string. I simply wish to modify the greedy quantification to possessive to prevent any chance of catastrophic backtracking. Thus I do not intend to quantify the quantifier (as perl naively assumes).

        Is the backtracking "catastrophic," in this case, due to a possible increased matching time by a greedy regex?