Re: Perl Complains of Nested Quantifiers
by ww (Archbishop) on May 22, 2012 at 15:52 UTC
|
Since you're seeking help, you'd better explain what you mean by "change my previously greedy quantification into a possessive modality" -- perhaps with a description of your intent that actually provides comprehensible information. And it wouldn't hurt to show a (small) sample of your data and your desired output (and possibly, explain why the last part of the code sample in para 1, ($meanSearchTerm) doesn't match the error message you posted.
BTW, Perl 5.14 on win7 executes your sample (minus the ($meanSearchTerm) without squawking:
C:>perl -e "my $foo='123.123';$foo =~/(\d++\.\d++)/oi; print $1;"
123.123
C:>
Update/afterthought: I was actually well started writing an explanation of why that wouldn't compile when I decided to check. Once again, testing saves my butt.
| [reply] [d/l] [select] |
|
| [reply] |
|
$ perl -Mre=debug -e '/\d++/'
Compiling REx "\d++"
synthetic stclass "ANYOF[0-9][{unicode_all}]".
Final program:
1: SUSPEND (7)
3: PLUS (5)
4: DIGIT (0)
5: SUCCEED (0)
6: TAIL (7)
7: END (0)
stclass ANYOF[0-9][{unicode_all}] minlen 1
Freeing REx: "\d++"
$
$
$ perl -Mre=debug -e '/\d+++/'
Compiling REx "\d+++"
Nested quantifiers in regex; marked by <-- HERE in m/\d+++ <-- HERE /
+at -e line 1.
$
| [reply] [d/l] |
Re: Perl Complains of Nested Quantifiers
by AnomalousMonk (Archbishop) on May 22, 2012 at 19:25 UTC
|
The 'atomic' extended pattern "(?>pattern)" (available in 5.8) will give you possessiveness around the entire pattern or any sub-pattern of your choosing. See Extended Patterns in perlre. (The following example doesn't really illustrate atomic/possessive matching; for a good discussion and pertinent examples, see the docs.)
>perl -wMstrict -le
"print qq{perl ver. $]};
my $rx = qr{ ((?> \d+)) }xms;
'abc1234def' =~ $rx;
print qq{'$1'};
"
perl ver. 5.008009
'1234'
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
Well, what this means is that a pattern like /foo(?:baz)++baz/ will never match. If you had a string like "foobazbazbaz" it would fail because (?:baz)++ will "gobble up" all of the "baz" in the string, and then refuse to give anything back. What it means by "does not disable backtracking" is that "foobazbazbaz foobazbazbaz" will *attempt* the /(?:baz)++/ twice, once after each "foo". Neither will match of course, and if the RE was _really_ smart it would know it could never match and wouldn't try at all, but it isn't. :-)
---
$world=~s/war/peace/g
| [reply] [d/l] |
|
A concern was expressed about "catastrophic backtracking." I'm curious about ... whether your solution would satisfactorily avert it.
It would not. AFAIU, possessive quantifiers like ++ *+ ?+ {n,m}+ are just special, limited cases of the general (?>...) atomic grouping. See example below.
>perl -wMstrict -le
"my $s = 'foobazbazbaz foobazbazbaz';
print 'match 1'
if $s =~ m{ foo (?> (?:baz)+) baz }xms;
print 'match 2'
if $s =~ m{ foo (?> .* baz) baz }xms;
"
(no output)
| [reply] [d/l] [select] |
Re: Perl Complains of Nested Quantifiers
by ikegami (Patriarch) on May 22, 2012 at 16:39 UTC
|
You must be using a version of Perl before this feature was added (5.10 IIRC).
If so, you're using an ancient version of Perl. 5.8, 5.10 and now 5.12 have all been end-of-lifed. 5.14 is still supported, and 5.16.0 is the latest stable version.
| [reply] |
|
Thanks. That clarifies things. Unfortunately, I am stuck using this ancient version due to various reasons.
Thanks everyone for all of your assistance in this matter.
| [reply] |
|
$ perl -Mre=debug -e'/x++/'
Compiling REx "x++"
Final program:
1: SUSPEND (8)
3: PLUS (6)
4: EXACT <x> (0)
6: SUCCEED (0)
7: TAIL (8)
8: END (0)
anchored "x" at 0 (checking anchored) minlen 1
Freeing REx: "x++"
$ perl -Mre=debug -e'/(?>x+)/'
Compiling REx "(?>x+)"
Final program:
1: SUSPEND (8)
3: PLUS (6)
4: EXACT <x> (0)
6: SUCCEED (0)
7: TAIL (8)
8: END (0)
anchored "x" at 0 (checking anchored) minlen 1
Freeing REx: "(?>x+)"
so there is nothing stopping you using it in 5.8.8
---
$world=~s/war/peace/g
| [reply] [d/l] [select] |
Re: Perl Complains of Nested Quantifiers
by roboticus (Chancellor) on May 22, 2012 at 15:50 UTC
|
sunmaz:
Update: Obviously, I should've tested my theory before shooting off my mouth. ww gave it a try (nice catch!) and didn't see the problem, when I tried, I couldn't reproduce either.
The problem is that you're asking for "one or more one or more digits". You either mean one or more digits, or one or more digits followed by a '+':
/(\d+\.\d+)/oi; # One or more digits
/(\d+\+\.\d+\+)/oi; # One or more digits with '+'
Since you mention greediness, you might mean to not take too many items from the string. However, since you're taking only digits and have a literal '.' between them, I don't know what you'd really mean. (\d+? doesn't make any sense, unless it were something like \d+?7, for example.)
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
|
| [reply] [d/l] |
|
| [reply] |