http://www.perlmonks.org?node_id=1015943

smls has asked for the wisdom of the Perl Monks concerning the following question:

After a regex has successfully matched, the @LAST_MATCH_START (shortcut: @-) special variable lists the character offset of the start of each matched subpattern ("capture group").

To use it, though, you need to know the indices of all capture groups of interest within the regex - which, in case of complex, possibly dynamically assembled regexes is inconvenient/impossible.

Named capture groups provide a more robust solution for working with complex/dynamic regexes, and I've really started to like them recently.
However, there seems to be no equivalent to @LAST_MATCH_START that would provide the start offset for capture groups referenced by name rather than by index.

The documentation mentions %LAST_MATCH_START (shortcut: %-) which, based on the name, one might expect to provide what I'm looking for, however it does something totally different. Something for which the long version of the name does not make any sense, and indeed that version throws an error for me (yes, I've tested with use English;).

Questions:

Replies are listed 'Best First'.
Re: Is there really no @LAST_MATCH_START equivalent for named capture groups?
by LanX (Saint) on Jan 30, 2013 at 00:53 UTC
    Just an idea:

    '%+' and '%-' are tied hashes, you may wanna dig into perlreapi and Tie::Hash::NamedCapture to find the capture group index corresponding to a name.

    Otherwise you could use one of the RegEx parsing modules on the regex to associate name to index.

    HTH.

    Cheers Rolf

      > Otherwise you could use one of the RegEx parsing modules on the regex to associate name to index.

      It's a hack but it gives you a hash mapping each name to it's index:

      (see Dynamically inspecting Regex OP-Codes at runtime? for an explanation)

      use strict; use warnings; use Data::Dump; my $a=qr/(?<C>1)(?<D>2)(3)(?<A>4)/; my $parsing=parse_regex($a); # parse lines like ' 1: OPEN1 'C' (3)' my %named_captures = ($parsing =~ /^ \s{1,3}\d{1,3}:\s+ # token number OPEN(\d)[ ]'(\w+)' # group(nr) 'name' [ ]\(\d+\) # next token $ /xgm); my %index_named_capture = reverse %named_captures; dd \%index_named_capture; # OUTPUT { A => 4, C => 1, D +=> 2 } sub parse_regex { my $regex=shift; my $re_compilation; # First, save away STDERR open my $SAVEERR, ">&STDERR"; close STDERR; open STDERR, ">", \$re_compilation or die "What the hell?\n"; # Now dynamically recompile a new regex, saving debug_info to $re_co +mpilation eval <<'_code_'; use re 'debug'; my $b=qr/$regex(?:)/; _code_ # Now close and restore STDERR to original condition. close STDERR; open STDERR, ">&", $SAVEERR; return $re_compilation; }
      Careful: ATM this can't handle repeated names properly, for this you need to adjust the reverse part.

      Cheers Rolf

      UPDATES

      corrected typo in code

Re: Is there really no @LAST_MATCH_START equivalent for named capture groups?
by Anonymous Monk on Jan 30, 2013 at 08:35 UTC