Re: RFC: named pattern match tokens

You might also like to look at extending the regexp syntax. This adds the new ( ... capture ... )\C{name} element to regular expression syntax. It copies the contents of the last closed capture into the scalar variable named 'name'. So /( [\dA-F]+ ) \C{ hex }/x would copy a hex string to the $hex variable.

use Regexp::NamedCaptures;

$_ = "three - four - five";
/(\w+)\C{baz} - (\w+)\C{qux}/g;

print "baz=$baz, qux=$qux\n";
[download]

Regexp::NamedCaptures
Updated: Changed the \N{ ... } to \C{ ... } to not conflict with named characters.
Also changed the return value of convert() so it returns the altered expression instead of the boolean result of the s///.

package Regexp::NamedCaptures;
use overload;

sub import
{
    shift;
    die "No argument allowed to " . __PACKAGE__ . "::import" if @_;
    overload::constant qr => \ &convert;
}

sub convert
{
    my $re = shift;
    $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) )
            {
                defined $2
                ? "(?{\$$2=\$^N})"
                : "\\"
            }xeg;
    $re;
}

1;
[download]

Comment on Re: RFC: named pattern match tokens Select or Download Code

Replies are listed 'Best First'.

Re^2: RFC: named pattern match tokens
by revdiablo (Prior) on Oct 04, 2004 at 21:53 UTC

You might also like to look at extending the regexp syntax

I thought about this, but didn't have any experience doing so. I just went with what I know, but I may take a look at your code, and see how it works.

It copies the contents of the last closed capture into the scalar variable named 'name'

I'm not sure I like this part. The idea of extending regular expression syntax is nice, but storing the matches in arbitrary scalars seems a bit sloppy. Maybe this can be reworked to store the results in a hash.

Something along the lines of:

use re 'eval';
use strict;

my $re = convert('(foo)\C{ foo }');

my %hash;
"foo bar" =~ $re;

print $hash{foo}, "\n";

sub convert
{
    my $re = shift;
    $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) )
            {
                defined $2
                ? "(?{\$hash{$2}=\$^N})"
                : "\\"
            }xeg;
    $re;
}
[download]

This is only marginally better, though, because instead of clobbering any arbitrary number of scalar variables, it clobbers one hash. Maybe there's a cleaner way to handle this.

[reply]
[d/l]

Re^3: RFC: named pattern match tokens

by Jenda (Abbot) on Oct 04, 2004 at 22:46 UTC

I thought the same. I think a nice name of the hash would be %~. =~ is matching so why couldn't $~{name} be a named match. Here is the code I ended up with:

...
sub convert
{
    my $re = shift;
    $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) )
            {
                defined $2
                ? "(?{\$~{$2}=\$^N})"
                : "\\"
            }xeg;
    "(?{undef(%~)})" # clear the %~
    .$re
    ."(?{\$~{\$_}=\${\$_} for(1..\$#+)})"; # add the numbered matches
}
...

my $re = qr/(\w+)\C{baz}(?: - (\w+)\C{qux})?(\+\d+)/;

"three - four - five+89" =~ $re;
print "baz=$~{baz}, qux=$~{qux}, $~{3}\n";
[download]

I also considered syntax like this:

my $re = qr/(?\$bar=\w+) - (?\$qux{not}=\w+)/;
[download]

...
sub convert
{
    my $re = shift;
    $re =~ s<\(\?\\\$([^=]+)=([^)]*)\)><($2)(?{\$$1=\$^N})>g;
    $re
}
...
[download]

my $re = qr/...(?\$var=a(\d+|\w-\w+)b).../;
[download]

Jenda
We'd like to help you learn to help yourself
Look around you, all you see are sympathetic eyes
Stroll around the grounds until you feel at home
-- P. Simon in Mrs. Robinson

[reply]
[d/l]
[select]

Re^4: RFC: named pattern match tokens

by diotalevi (Canon) on Oct 04, 2004 at 23:10 UTC

%~ is not available for your use. Punctuation variables are reserved for perl's use. The ^_ namespace is reserved for this use. The closest available analogue of %~ is %{'^_~'} because %^_~ is a syntax error. I'd suggest %^_C to follow the \C{name} theme.
Hashes are cleared by assigning an empty list, not by undefining them. When you say %hash = () you allow perl to be smart about the allocation of the memory associated with %hash. undef %hash circumvents this and forces some unnecessary work.
I deliberately placed the new syntax to the right of the capture because otherwise I would have had to do some balanced delimiter matching. perlop covers the requirements for matching (...) in regexps. It is possible, I just couldn't do it in the two minutes it took to write the initial example.
The implication of allowing $~{EXPR} to inform the creation of the hash key is that you must allow arbitrary perl code inside EXPR. This is not a problem if you take into account the same balanced-tag handling already mentioned in perlop.
To do this really well requires Text::Balanced and an understanding of Gory details of parsing quoted constructs from perlop.

[reply]
[d/l]
[select]

Re^3: RFC: named pattern match tokens

by diotalevi (Canon) on Oct 04, 2004 at 22:12 UTC

Well that's fine. It could clobber %Regexp::NamedCapture::Captures because regexp results are already globals.

[reply]


Think about Loose Coupling
	PerlMonks