Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

This actually is not a bug. It is just a slightly counter-intuitive result of how @+/@-, (?(DEFINE) ..) and named-captures/named-subroutines all work, and probably could have been implemented slightly differently without any harm, but as of now, the behaviour probably cannot be changed.

First, I modified a version of your code from Re^4: Strange behavior of @- and @+ in perl5.10 regexps:

use v5.10; my $input; local $" = ", "; my $parser = qr{ (?{ say "before:\n\@- = (@-)\t\t ".scalar(@-)." items\n\@+ = (@+) +\t ".scalar(@+)." items\n"; }) ^ ((?&expr)) ((?&expr)) \z (?{ say "after:\n\@- = (@-)\t\t ".scalar(@-)." items\n\@+ = (@+)\ +t ".scalar(@+)." items\n"; }) (?(DEFINE) (?<expr> (.) (.) (?{ say "expr:\n\@- = (@-)\t ".scalar(@-)." items\n\@+ = +(@+)\t ".scalar(@+)." items\n"; }) ) ) }x; $input = "abcd"; chomp($input); if ($input =~ $parser) { say "matches: ($&)"; say "At the very end:\n\@- = (@-)\t ".scalar(@-)." items\n\@+ = (@+) +\t ".scalar(@+)." items\n"; } __END__

The pattern compiles down to the following:

Compiling REx "%n (?{%n say %"before:\n\@- = (@-)\t\t %".sc +alar("... synthetic stclass "ANYOF[\0-\11\13-\377][{unicode_all}]". Final program: 1: EVAL (3) 3: BOL (4) 4: OPEN1 (6) 6: GOSUB3[+19] (9) 9: CLOSE1 (11) 11: OPEN2 (13) 13: GOSUB3[+12] (16) 16: CLOSE2 (18) 18: EOS (19) 19: EVAL (21) 21: DEFINEP (23) 23: IFTHEN (44) 25: OPEN3 'expr' (27) 27: OPEN4 (29) 29: REG_ANY (30) 30: CLOSE4 (32) 32: OPEN5 (34) 34: REG_ANY (35) 35: CLOSE5 (37) 37: EVAL (39) 39: CLOSE3 'expr' (44) 41: LONGJMP (43) 43: TAIL (44) 44: END (0) floating ""$ at 2..2147483647 (checking floating) stclass ANYOF[\0-\11 +\13-\377][{unicode_all}] minlen 2 with eval

Which outputs:

before: @- = (0) 1 items @+ = (0, , , , , ) 6 items expr: @- = (0, , , , 0, 1) 6 items @+ = (2, , , , 1, 2) 6 items expr: @- = (0, 0, , , 2, 3) 6 items @+ = (4, 2, , , 3, 4) 6 items after: @- = (0, 0, 2) 3 items @+ = (4, 2, 4, , , ) 6 items matches: (abcd) At the very end: @- = (0, 0, 2) 3 items @+ = (4, 2, 4, , , ) 6 items

So, first, if you look at Perl 5.10.x perlvar under @- and @+ you will see the following documentation. I have bolded the relevent bits.

@LAST_MATCH_END
@+

This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against. The nth element of this array holds the offset of the nth submatch, so $+[1] is the offset past where $1 ends, $+[2] the offset past where $2 ends, and so on.

You can use $#+ to determine how many subgroups were in the last successful match.

See the examples given for the "@-" variable.

@LAST_MATCH_START
@-

$-[0] is the offset of the start of the last successful match. $-[$n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.

Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with substr $_, $-[n], $+[n] - $-[n] if $-[n] is defined, and $+ coincides with substr $_, $-[$#-], $+[$#-] - $-[$#-]. One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with @+.

This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The nth element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.

After a match against some variable $var:

$` is the same as substr($var, 0, $-[0])
$& is the same as substr($var, $-[0], $+[0] - $-[0])
$' is the same as substr($var, $+[0])
$1 is the same as substr($var, $-[1], $+[1] - $-[1])
$2 is the same as substr($var, $-[2], $+[2] - $-[2])
$3 is the same as substr($var, $-[3], $+[3] - $-[3])

Now, you may wonder, ok, well then, "why six elements"? Because it is not at first obvious, as it appears there are only four capture buffers being used in the pattern, so there should be five slots used (the zeroth element is used to track $&). However there are actually five capture buffers in this pattern, as one is reserved for the (?<expr> ... ), although it doesn't get set because it is in the (?(DEFINE) (?<expr>...)) and is only ever executed as (?&expr) which actually never executes the /capture/ part of the (?<expr> ... ) so the 4th slot of the pattern never gets populated.

This was actually a deliberate design decision, consider that it would be awkward if /(?<foo>foo)((?&foo))/ resulted in $1 and $2 pointing at the same string, however maybe what happens to a capture buffer defined in a DEFINE block should have been reviewed once (?(DEFINE) ...) was introduced. The development of these features was somewhat organic, with a lot of it actually just being "tricks", for instance (?(DEFINE) ... ) isn't really special, at heart it is just an optimized alias of (?(0) ... ), (with some error checking to disallow an ELSE block), and subroutines just piggy back on named capture, so... Well, as is sometimes said of Perl core-dev, its all a bit of a game of Jenga. :-)

While it might be arguable that there should not be a slot reserved for a named capture buffer defined in a (?(DEFINE) ... ) block, the fact that @- and @+ are not the same size is a deliberate choice, and the behaviour you are seeing is expected, although admittedly in this context the results are bit odd looking.

HTH

Note:I rejected the bug report you filed on this, thanks anyway. It did raise an interesting question that I will think on.

---
$world=~s/war/peace/g


In reply to Re: Strange behavior of @- and @+ in perl5.10 regexps by demerphq
in thread Strange behavior of @- and @+ in perl5.10 regexps by casiano

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (7)
    As of 2014-09-03 02:28 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite cookbook is:










      Results (35 votes), past polls