Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Performance of possessive quantifiers

by hipowls (Curate)
on Jan 27, 2008 at 19:26 UTC ( [id://664550]=note: print w/replies, xml ) Need Help??


in reply to Performance of possessive quantifiers

The possessive form, PAT++ is short hand for (?>PAT+), from perlre

Possessive quantifiers are equivalent to putting the item they are applied to inside of one of these constructs. The following equivalences apply: Quantifier Form Bracketing Form --------------- --------------- PAT*+ (?>PAT*) PAT++ (?>PAT+) PAT?+ (?>PAT?) PAT{min,max}+ (?>PAT{min,max})
In your case m/$re++\d/ == m/(?>$re+)\d/, just prettier;)

Replies are listed 'Best First'.
Re^2: Performance of possessive quantifiers
by moritz (Cardinal) on Jan 27, 2008 at 22:02 UTC
    Yes, that's right, and it explains why the posessive and atomic1 have the same performance impact.

    (The parse tree that perl5.10.0 -cMre=debug -we 'm/(?:be|ea|a)++\d/' and perl5.10.0 -cMre=debug -we 'm/(?>(?:be|ea|a)+)\d/' are indeed identical).

    But it doesn't explain the performance compared to the non-possessive non-atomic form.

      Good question, I suspect that it is an optimization. Trying

      my $str = "bea" x 100; my $re = qr/(?:be|ea|a)/; sub atomic { use re 'debug'; say 'Matched $re+\d' if $str =~ m/$re+\d/; } sub possessive { use re 'debug'; say 'Matched $re++\d' if $str =~ m/$re++\d/; } atomic(); possessive();
      The regexes compile to
      # atomic Compiling REx "(?-xism:(?:be|ea|a))+\d" Final program: 1: CURLYX[0] {1,32767} (14) 3: TRIE-EXACT[abe] (13) <be> <ea> <a> 13: WHILEM[1/1] (0) 14: NOTHING (15) 15: DIGIT (16) 16: END (0) minlen 2 #possessive Compiling REx "(?-xism:(?:be|ea|a))++\d" Final program: 1: SUSPEND (19) 3: CURLYX[0] {1,32767} (16) 5: TRIE-EXACT[abe] (15) <be> <ea> <a> 15: WHILEM[1/1] (0) 16: NOTHING (17) 17: SUCCEED (0) 18: TAIL (19) 19: DIGIT (20) 20: END (0) minlen 2 # regex.ato and regex.pos contain the output including # the Compiling REx message michael@lnx-main:working> wc regex.ato regex.pos 10650 70471 1837158 regex.ato 229368 1989603 47988305 regex.pos
      As you can see it is doing a lot more. I don't know why but it will be interesting to find out;)

      Update 1: It may be something to do with caching. Looking in the re debug output I find

      michael@lnx-main:working> grep Detected regex.ato whilem: Detected a super-linear match, switching on caching... michael@lnx-main:working> grep Detected regex.pos whilem: Detected a super-linear match, switching on caching... # Now look for this line # whilem: (cache) already tried at this position... michael@lnx-main:working> grep '(cache)' regex.ato | wc 298 2086 26426 michael@lnx-main:working> grep '(cache)' regex.pos | wc 0 0 0
      At this point I am prepared to confess that I will struggling a bit to dig in deeper;-)

      Update 2: The SUSPEND/TAIL pair look like where any interested monk should start. In perldebguts I found this

      # Do nothing NOTHING no Match empty string. # A variant of above which delimits a group, thus stops optimizations TAIL no Match empty string. Can jump here from outside.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://664550]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-20 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found