http://www.perlmonks.org?node_id=1005151

mhgoeschl has asked for the wisdom of the Perl Monks concerning the following question:

I'm using named capture buffers within a non-scoping repeat (?: )+ structure where the buffer status (<mybuf> defined ?) is tested by an if-then-else sequence, e.g.: /(? ... (?(<mybuf>)(?:match something)|(?<mybuf>match something else)) ... )+/. Now, I would need to reset <mybuf> (to undef or defined value) within the regexp WITHOUT consuming any character of my match target. Since testing the buffer is possible using ?(<mybuf>), I'm speculating that resetting it should also be possible at runtime ?

Replies are listed 'Best First'.
Re: reset named capture buffer within regex
by tobyink (Canon) on Nov 22, 2012 at 16:28 UTC

    I can't seem to find any way of resetting named captures mid-regexp. (And have managed several segfaults trying.)

    Why exactly do you need to reset it? Is it just because you want the right-most match instead of the left-most match? If so, take a look at %-.

    ++ for the question though; I'll be interested to read any other replies.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      the purpose is matching sequences (variable length) of tokens out of a larger list (variable length) of tokens with each token carrying a number of additional attributes (e.g. number) that qualifies it as part of a token group, e.g. A1 B1 b1 B1 b2 C2 c2 d3 D3. For example, looking for sequences of B|b that are part of the same (number) group. Results here: 'B1 b1 B1' and 'b2'. Unfortunately, it is no option to slice the sequence by the 'group' (number) attribute and do the token matching on every subgroup, because there are other attributes on the tokens that bridge the groups and are being considered in the overall expression (to complex to show here). The philosophy of the current pattern attempts to match a token sequence, loads the group attribute of the first token in <mybuf> and continues matching using <mybuf> as a target for attributes of upcoming tokens. This works perfectly, however I need the option to break this sequence (reset <mybuf> without actual match) in certain cases.
Re: reset named capture buffer within regex
by LanX (Saint) on Nov 22, 2012 at 18:45 UTC
    > WITHOUT consuming any character of my match target.

    I don't fully understand what you are doing, just a hint:

    The position will not be altered if you try an "impossible" match within a look-ahead or look-behind.

    So you maybe wanna try something like (?=(?<name> ... ))?

    HTH!

    Cheers Rolf

Re: reset named capture buffer within regex
by ColonelPanic (Friar) on Nov 23, 2012 at 11:15 UTC
    Adding to what others have said already, I think this simple test probably proves it can't be done:
    /(?<foo_match>foo)(??{undef $+{foo_match}})/
    Result: modification of read-only value attempted.

    This is in line with other regex-related variables (such as the current match position), which can't be altered mid-pattern.

    You could do something equivalent using embedded code and setting a variable rather than using named capture. However, that might not be the best idea.

    In my mind, when a pattern starts having this much internal logic, the correct way to solve the problem is usually by breaking it down into multiple steps. As cool as Perl's advanced regex features are to play around with, just because you can do it all in a regex doesn't mean you should.



    When's the last time you used duct tape on a duct? --Larry Wall

      %+ and %- are "just" tied hashes which give access to the underlying named capture data buried away in the Perl core. Even if you were able to alter the hashes, there's no guarantee that it would effect the underlying data, so using ?(<mybuf>) later on in the regular expression might give surprising results.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'