Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Perl Complains of Nested Quantifiers

by sunmaz (Novice)
on May 22, 2012 at 15:31 UTC ( #971814=perlquestion: print w/ replies, xml ) Need Help??
sunmaz has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to change my previously greedy quantification into a possessive modality.

My regex is: /(\d++\.\d++)($meanSearchTerm)/oi

Perl complains thusly: Nested quantifiers in regex; marked by <-- HERE in m/([\d]++ <-- HERE \.[\d]++)( bits = mean)/ at ScanGenome.pl line 326, <RIBL> line 1.

I would like to resolve this without further capturing (so no additional parens).

Any help would be much appreciated. Thank you.

Comment on Perl Complains of Nested Quantifiers
Select or Download Code
Re: Perl Complains of Nested Quantifiers
by roboticus (Canon) on May 22, 2012 at 15:50 UTC

    sunmaz:

    Update: Obviously, I should've tested my theory before shooting off my mouth. ww gave it a try (nice catch!) and didn't see the problem, when I tried, I couldn't reproduce either.

    The problem is that you're asking for "one or more one or more digits". You either mean one or more digits, or one or more digits followed by a '+':

    /(\d+\.\d+)/oi; # One or more digits /(\d+\+\.\d+\+)/oi; # One or more digits with '+'

    Since you mention greediness, you might mean to not take too many items from the string. However, since you're taking only digits and have a literal '.' between them, I don't know what you'd really mean. (\d+? doesn't make any sense, unless it were something like \d+?7, for example.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I wish to match the same as the following regex would (you can ignore the scalar at the end of my pattern - it is just a substitution for a simple search string): /(\d+\.\d+)($meanSearchTerm)/oi. That is it matches strings of the form: floating point numbersearch string. I simply wish to modify the greedy quantification to possessive to prevent any chance of catastrophic backtracking. Thus I do not intend to quantify the quantifier (as perl naively assumes).

        Is the backtracking "catastrophic," in this case, due to a possible increased matching time by a greedy regex?

Re: Perl Complains of Nested Quantifiers
by ww (Bishop) on May 22, 2012 at 15:52 UTC

    Since you're seeking help, you'd better explain what you mean by "change my previously greedy quantification into a possessive modality" -- perhaps with a description of your intent that actually provides comprehensible information. And it wouldn't hurt to show a (small) sample of your data and your desired output (and possibly, explain why the last part of the code sample in para 1, ($meanSearchTerm) doesn't match the error message you posted.

    BTW, Perl 5.14 on win7 executes your sample (minus the ($meanSearchTerm) without squawking:

    C:>perl -e "my $foo='123.123';$foo =~/(\d++\.\d++)/oi; print $1;" 123.123 C:>

    Update/afterthought: I was actually well started writing an explanation of why that wouldn't compile when I decided to check. Once again, testing saves my butt.

      Please refer to my above reply. I hope that is sufficiently detailed.

      It is curious that it works on 5.14. It does not on the version I am running (5.8.8).

        I think "Possessive Quantifiers" in perl5100delta Regular-expressions explains the version dependency. I am on 5.12.2, and here is what re shows:
        $ perl -Mre=debug -e '/\d++/' Compiling REx "\d++" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: SUSPEND (7) 3: PLUS (5) 4: DIGIT (0) 5: SUCCEED (0) 6: TAIL (7) 7: END (0) stclass ANYOF[0-9][{unicode_all}] minlen 1 Freeing REx: "\d++" $ $ $ perl -Mre=debug -e '/\d+++/' Compiling REx "\d+++" Nested quantifiers in regex; marked by <-- HERE in m/\d+++ <-- HERE / +at -e line 1. $
Re: Perl Complains of Nested Quantifiers
by ikegami (Pope) on May 22, 2012 at 16:39 UTC

    You must be using a version of Perl before this feature was added (5.10 IIRC).

    If so, you're using an ancient version of Perl. 5.8, 5.10 and now 5.12 have all been end-of-lifed. 5.14 is still supported, and 5.16.0 is the latest stable version.

      Thanks. That clarifies things. Unfortunately, I am stuck using this ancient version due to various reasons.

      Thanks everyone for all of your assistance in this matter.

        FWIW the "possessive modifier" is simply syntactic sugar for (?>...) so (?>x+) should produce exactly the same optree as x++.

        $ perl -Mre=debug -e'/x++/' Compiling REx "x++" Final program: 1: SUSPEND (8) 3: PLUS (6) 4: EXACT <x> (0) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "x" at 0 (checking anchored) minlen 1 Freeing REx: "x++" $ perl -Mre=debug -e'/(?>x+)/' Compiling REx "(?>x+)" Final program: 1: SUSPEND (8) 3: PLUS (6) 4: EXACT <x> (0) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "x" at 0 (checking anchored) minlen 1 Freeing REx: "(?>x+)"

        so there is nothing stopping you using it in 5.8.8

        ---
        $world=~s/war/peace/g

Re: Perl Complains of Nested Quantifiers
by AnomalousMonk (Abbot) on May 22, 2012 at 19:25 UTC

    The 'atomic' extended pattern  "(?>pattern)" (available in 5.8) will give you possessiveness around the entire pattern or any sub-pattern of your choosing. See Extended Patterns in perlre. (The following example doesn't really illustrate atomic/possessive matching; for a good discussion and pertinent examples, see the docs.)

    >perl -wMstrict -le "print qq{perl ver. $]}; my $rx = qr{ ((?> \d+)) }xms; 'abc1234def' =~ $rx; print qq{'$1'}; " perl ver. 5.008009 '1234'

      The Extended Patterns documentation mentions that "(?>pattern) does not disable backtracking altogether once it has matched." A concern was expressed about "catastrophic backtracking." I'm curious about the nature of the 'catastrophe,' and whether your solution would satisfactorily avert it.

        Well, what this means is that a pattern like /foo(?:baz)++baz/ will never match. If you had a string like "foobazbazbaz" it would fail because (?:baz)++ will "gobble up" all of the "baz" in the string, and then refuse to give anything back. What it means by "does not disable backtracking" is that "foobazbazbaz foobazbazbaz" will *attempt* the /(?:baz)++/ twice, once after each "foo". Neither will match of course, and if the RE was _really_ smart it would know it could never match and wouldn't try at all, but it isn't. :-)

        ---
        $world=~s/war/peace/g

        A concern was expressed about "catastrophic backtracking." I'm curious about ... whether your solution would satisfactorily avert it.

        It would not. AFAIU, possessive quantifiers like  ++ *+ ?+ {n,m}+ are just special, limited cases of the general  (?>...) atomic grouping. See example below.

        >perl -wMstrict -le "my $s = 'foobazbazbaz foobazbazbaz'; print 'match 1' if $s =~ m{ foo (?> (?:baz)+) baz }xms; print 'match 2' if $s =~ m{ foo (?> .* baz) baz }xms; "
        (no output)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://971814]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2014-12-25 14:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls