Beefy Boxes and Bandwidth Generously Provided by pair Networks BBQ
laziness, impatience, and hubris
 
PerlMonks  

Re: Interesting behavior of regular expression engine

by tobyink (Abbot)
on Mar 12, 2013 at 23:35 UTC ( #1023077=note: print w/ replies, xml ) Need Help??


in reply to Interesting behavior of regular expression engine

I'm no expert, but use re 'debug' seems to offer some insight.

In both cases the last part of the regexp is the longest floating string, so is the part that Perl attempts to match first.

In the first case, all that remains is to match "a" against /.+/. This succeeds, and your code returns true (say returns true if it is able to print anything), thus the whole match succeeds. The code only needs to be executed once.

In the second case, it tries to slot letters into two patterns: /.+/ before the code and /./ after the code, and it has to shuffle them around six times before it gets a success.

Personally, I think it's a bad idea to rely on code embedded in regexps having side-effects. How many times it will be executed, and in what order is pretty unpredictable; a new Perl release could change it.

package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name


Comment on Re: Interesting behavior of regular expression engine
Select or Download Code
Re^2: Interesting behavior of regular expression engine
by rjt (Chaplain) on Mar 12, 2013 at 23:51 UTC

    I have to agree with tobyink on this one. Regexes are supposed to provide stable output given a set of well-defined rules (which thanks to p5p they pretty much always have, without fail). What's not well-defined is what steps the underlying implementation takes to arrive at the output, and that's a Good Thing, as some of the optimizations this enables are quite profound indeed. I don't get the sense that you (the OP) are advocating this weird execution side effect not working as being a bug that needs to be fixed, but nonetheless feel it important to underscore my own strong preference for end-result correctness first, performance second, and weird effects likely to break tomorrow coming in a distant third.

      What's not well-defined is what steps the underlying implementation takes to arrive at the output,

      Yet, "Programming Perl 3rd" lays out 6 very complicated rules describing in detail how the regex engine proceeds (p 197-201). So, Larry at least thinks the steps are/were well defined. I wonder if those steps are in the new edition?

        So, Larry at least thinks the steps are well defined.

        I don't know what you're talking about, but that book is 12 years old at least, and larry himself hasn't touched the regex engine in about as long , optimizations have been added, things have changed

Re^2: Interesting behavior of regular expression engine (last not matched first)
by tye (Cardinal) on Mar 13, 2013 at 05:52 UTC
    In both cases the last part of the regexp is the longest floating string, so is the part that Perl attempts to match first.

    I've never seen that be the case and I seriously doubt that it was the case in your test run.

    The only thing I've seen the regex engine do with the "longest floating string" is to use only the length it to estimate the offset where it will begin searching for a match (going left-to-right in the string and left-to-right in the regex).

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1023077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-04-19 23:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls