http://www.perlmonks.org?node_id=300297

Len has asked for the wisdom of the Perl Monks concerning the following question:

I can't explain this behaviour:
use strict; our $regex = qr /( # Start capture \( # Start with '(', (?: # Followed by (?>[^()]+) # Non-parenthesis |(??{ $regex }) # Or a balanced () block )* # zero or more times \) # Close capture )/x; # Ending with ')' my $text = '(outer(inner(most inner)))'; $text =~ /$regex/gs; print "$1\n";
Why do I have to use our $regex instead of my $regex to get the expected result ?

our $regex prints (outer(inner(most inner)))
my $regex prints (most inner)

Len

Replies are listed 'Best First'.
Re: my versus our in nested regex
by BrowserUk (Patriarch) on Oct 18, 2003 at 17:08 UTC

    my variables don't come into existance until the end of the statement in which they are declared, which means that when the regex is being compiled, <update> the lexical scalar,<update> $regex doesn't yet exist.

    Using our bypasses this problem.

    If you want to avoid using a global, then pre-declare the lexical.

    use strict; my $regex; $regex = qr /( # Start capture \( # Start with '(', (?: # Followed by (?>[^()]+) # Non-parenthesis |(??{ $regex }) # Or a balanced () block )* # zero or more times \) # Close capture )/x; # Ending with ')' my $text = '(outer(inner(most inner)))'; $text =~ /$regex/gs; print "$1\n"; __END__ P:\test>junk2 (outer(inner(most inner)))

    Updated description in the light of liz's observation below.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      ...which means that when the regex is being compiled, $regex doesn't yet exist.

      Actually, this is incorrect. $regex does exist but refers to an (undefined) global variable with the same name at that stage. Which the following with strict will reveal:

      $ perl -Mstrict -e 'my $foo = $foo' Global symbol "$foo" requires explicit package name at -e line 1. Execution of -e aborted due to compilation errors.

      Liz

      It is interesting that strict doesn't complain in the following:

      use strict; my $regex = qr/(?{{$regex}})/;

      and it isn't just because the parser has seen that variable:

      use strict; my $regex = qr/(?{{$undeclared}})/;

      doesn't complain either.

        Agreed. Quite why strictness doesn't propogate to regex code blocks is a good question.

        You can always enable it yourself:)

        P:\test>perl -le"my $re = qr[(??{ use strict; $re })];" Global symbol "$re" requires explicit package name at (re_eval 1) line + 2. Compilation failed in regexp at -e line 1.

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!

Re: my versus our in nested regex
by shenme (Priest) on Oct 18, 2003 at 17:02 UTC
    If you had turned on warnings you would have seen the message "Use of uninitialized value in pattern match (m//) at len01.pl line 14.".

    I do know that my variables are not _really_ in the symbol table hash, and that might be tripping up things while constructing the compiled regex and then executing it later.

Re: my versus our in nested regex
by pernod (Chaplain) on Oct 21, 2003 at 10:33 UTC

    Jeffrey Friedl talks a bit about this in Mastering Regular Expressions in the section called "A Warning About Embedded Code and my Variables" (page 338-339). His conclusion on the matter is that an embedded code construct is in fact a closure.

    This means that using a lexical variable inside an embedded code construct in a regular expression binds the instance of the lexical variable in existence at the moment the regex is compiled to the regex. As far as I understand, this means that:

    my $regex = qr /( # Start capture \( # Start with '(', (?: # Followed by (?>[^()]+) # Non-parenthesis |(??{ $regex }) # Or a balanced () block )* # zero or more times \) # Close capture )/x; # Ending with ')'

    with regard to liz' remark about lexicals at compile time, is "interpreted" (pardon the hand-waving) as:

    my $regex = qr /( # Start capture \( # Start with '(', (?: # Followed by (?>[^()]+) # Non-parenthesis |(??{ undef }) # <-- Note! 'or undef' )* # zero or more times \) # Close capture )/x; # Ending with ')'

    which will match the innermost parentheses. We don't have any undefs in the target string, so how can this part of the construct match?

    Trying to write this down in a sensible manner proved to be quite a challenge, so I apologize if the preceeding section is hard to understand. But to quote Friedl from the aforementioned section: "Warning: this section is not light reading." :o)

    Hope this helps.

    pernod
    --
    Mischief. Mayhem. Soap.