my $str = "bea" x 100;
my $re = qr/(?:be|ea|a)/;
sub atomic {
use re 'debug';
say 'Matched $re+\d' if $str =~ m/$re+\d/;
}
sub possessive {
use re 'debug';
say 'Matched $re++\d' if $str =~ m/$re++\d/;
}
atomic();
possessive();
The regexes compile to
# atomic
Compiling REx "(?-xism:(?:be|ea|a))+\d"
Final program:
1: CURLYX[0] {1,32767} (14)
3: TRIE-EXACT[abe] (13)
<be>
<ea>
<a>
13: WHILEM[1/1] (0)
14: NOTHING (15)
15: DIGIT (16)
16: END (0)
minlen 2
#possessive
Compiling REx "(?-xism:(?:be|ea|a))++\d"
Final program:
1: SUSPEND (19)
3: CURLYX[0] {1,32767} (16)
5: TRIE-EXACT[abe] (15)
<be>
<ea>
<a>
15: WHILEM[1/1] (0)
16: NOTHING (17)
17: SUCCEED (0)
18: TAIL (19)
19: DIGIT (20)
20: END (0)
minlen 2
# regex.ato and regex.pos contain the output including
# the Compiling REx message
michael@lnx-main:working> wc regex.ato regex.pos
10650 70471 1837158 regex.ato
229368 1989603 47988305 regex.pos
As you can see it is doing a lot more. I don't know why but it will be interesting to find out;)
Update 1: It may be something to do with caching. Looking in the re debug output I find
michael@lnx-main:working> grep Detected regex.ato
whilem: Detected a super-linear match, switching on caching...
michael@lnx-main:working> grep Detected regex.pos
whilem: Detected a super-linear match, switching on caching...
# Now look for this line
# whilem: (cache) already tried at this position...
michael@lnx-main:working> grep '(cache)' regex.ato | wc
298 2086 26426
michael@lnx-main:working> grep '(cache)' regex.pos | wc
0 0 0
At this point I am prepared to confess that I will struggling a bit to dig in deeper;-)
Update 2: The SUSPEND/TAIL pair look like where any interested monk should start. In perldebguts I found this
# Do nothing
NOTHING no Match empty string.
# A variant of above which delimits a group, thus stops optimizations
TAIL no Match empty string. Can jump here from outside.
|