Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

better way to get last named capture group

by ysth (Canon)
on Jul 01, 2025 at 15:07 UTC ( [id://11165514]=perlquestion: print w/replies, xml ) Need Help??

ysth has asked for the wisdom of the Perl Monks concerning the following question:

In python, I can abuse capture group names in a substitution to simplify the substitution code, like:
$ python -c 'import regex; print(regex.sub("^(?<a>.)|(?<c>.)$|(?<b>.)" +, lambda m:m.lastgroup, "xxx"))' abc
In perl, if there is a single variable equivalent to lastgroup, I can't find it, the closest I see is %+:
$ perl -E'say "xxx" =~ s/^(?<a>.)|(?<c>.)$|(?<b>.)/@{[keys %+]}/gr' abc
which isn't awful, but is there a better way I'm missing?

A math joke: r = | |csc(θ)|+|sec(θ)| |-| |csc(θ)|-|sec(θ)| |

Replies are listed 'Best First'.
Re: better way to get last named capture group
by NERDVANA (Priest) on Jul 01, 2025 at 16:37 UTC
    I don't believe that specific feature exists, but perl has a lot of other quirky tools in the regex engine, and you haven't really explained what real problem you're trying to solve. At a glance, I can't think how knowing "lastgroup" would be useful in real-world problems, except in the case where you were trying to discover which of N capture groups were found. If the goal was to perform a more efficient "switch" to execute code after specific captures were detected, the perl tool for that is to embed perl code directly into the regex so that it runs right after the capture.
    perl -E 'say "xxx" =~ s/^.(?{"a"})|.$(?{"c"})|.(?{"b"})/$^R/gr'

    or with /x

    say "xxx" =~ s/ ^. (?{ "a" }) | .$ (?{ "c" }) | . (?{ "b" }) /$^R/grx'
    The advantage here is that you don't have to re-dispatch based on which matched; you are immediately running code based on the pattern. The disadvantage is that you have to pay close attention to how this feature interacts with backtracking, because the perl regex engine may run your code block before deciding that the pattern doesn't match, and then run different code for the same characters. In this case, I put all the code to execute *after* the pattern has been matched. In general, try to avoid side effects and return a value (to $^R) that can be used only once the replacement is determined to have succeeded. If you need side effects, you can also make use of the various regex directives that prevent backtracking. (see perldoc re and search for "backtrack")

    I can dig up examples of how you can expand this to a full language parser, if you're interested.

Re: better way to get last named capture group
by ikegami (Patriarch) on Jul 01, 2025 at 15:28 UTC

    %- and %+ are the only var that contains capture names.

Re: better way to get last named capture group
by ysth (Canon) on Jul 01, 2025 at 22:57 UTC
    An alternative that feels cleaner, though (*MARK:NAME) is supported way fewer places than named captures:
    $ perl -E'say "xxx" =~ s/(*MARK:a)^.|(*MARK:c).$|(*MARK:b)./$REGMARK/g +r' abc
    $ python -c 'import pcre2; print(pcre2.sub("(*MARK:a)^.|(*MARK:c).$|(* +MARK:b).", "$*MARK", "xxx"))' abc
Re: better way to get last named capture group
by sleet (Monk) on Jul 01, 2025 at 18:31 UTC
    For your example, wouldn't a branch reset be more appropriate?
      Exactly the opposite! I want to distinguish which branch matched, when the matched strings could all be the same.

        With numbered captures, defined( $1 ) tells us if a capture is part of the matching patch.

        The equivalent with named captures would be exists( $+{ name } ).

        To check which of a set, ( grep exists( $+{ $_ } ), qw( name1 name2 name3 ) )[ 0 ] could be used.

        And of course, that simplifies to ( keys( %+ ) )[ 0 ] if only one named capture could have captured.

Re: better way to get last named capture group
by LanX (Saint) on Jul 02, 2025 at 23:21 UTC
    If your goal was to mimic the exact Python semantic, I'd try to patch or subclass Tie::Hash::NamedCapture to record the last name set.

    > Internally %+ and %- are implemented with a real tied interface via Tie::Hash::NamedCapture

    In a hurry I couldn't find the implementation, not sure if it was done in pure Perl or XS/C.

    But even if in C, monkey patching STORE with a wrapper should do the job

    update

    Never mind.

    I ran some experiments and couldn't succeed in patching STORE to any effect.

    And according to the docs

    > %- and %+ are tied views into a common internal hash associated with the last successful regular expression.

    this wouldn't help either, because the internal hash is out of reach ... too bad.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      The source is in perl's universal.c, but the update tie methods don't do anything; the read tie methods call named_buff or named_buff_iter to get the data from the regex engine. So no patching possible.
        Yeah, the tie is basically only a frontend to the regex engine.

        One side effect is that you can tie other %hashes to act like %+ and %-

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

      It's in universal.c.

      It would be possible to modify Perl to provide something like

      tied( %+ )->latest_capture_name

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11165514]
Approved by GrandFather
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2025-07-20 00:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.