Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: RFC: named pattern match tokens

by diotalevi (Canon)
on Oct 04, 2004 at 20:04 UTC ( #396371=note: print w/replies, xml ) Need Help??

in reply to RFC: named pattern match tokens

You might also like to look at extending the regexp syntax. This adds the new ( ... capture ... )\C{name} element to regular expression syntax. It copies the contents of the last closed capture into the scalar variable named 'name'. So /( [\dA-F]+ ) \C{ hex }/x would copy a hex string to the $hex variable.

use Regexp::NamedCaptures; $_ = "three - four - five"; /(\w+)\C{baz} - (\w+)\C{qux}/g; print "baz=$baz, qux=$qux\n";

Updated: Changed the \N{ ... } to \C{ ... } to not conflict with named characters.
Also changed the return value of convert() so it returns the altered expression instead of the boolean result of the s///.

package Regexp::NamedCaptures; use overload; sub import { shift; die "No argument allowed to " . __PACKAGE__ . "::import" if @_; overload::constant qr => \ &convert; } sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$$2=\$^N})" : "\\" }xeg; $re; } 1;

Replies are listed 'Best First'.
Re^2: RFC: named pattern match tokens
by revdiablo (Prior) on Oct 04, 2004 at 21:53 UTC
    You might also like to look at extending the regexp syntax

    I thought about this, but didn't have any experience doing so. I just went with what I know, but I may take a look at your code, and see how it works.

    It copies the contents of the last closed capture into the scalar variable named 'name'

    I'm not sure I like this part. The idea of extending regular expression syntax is nice, but storing the matches in arbitrary scalars seems a bit sloppy. Maybe this can be reworked to store the results in a hash.

    Something along the lines of:

    use re 'eval'; use strict; my $re = convert('(foo)\C{ foo }'); my %hash; "foo bar" =~ $re; print $hash{foo}, "\n"; sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$hash{$2}=\$^N})" : "\\" }xeg; $re; }

    This is only marginally better, though, because instead of clobbering any arbitrary number of scalar variables, it clobbers one hash. Maybe there's a cleaner way to handle this.

      I thought the same. I think a nice name of the hash would be %~. =~ is matching so why couldn't $~{name} be a named match. Here is the code I ended up with:

      ... sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$~{$2}=\$^N})" : "\\" }xeg; "(?{undef(%~)})" # clear the %~ .$re ."(?{\$~{\$_}=\${\$_} for(1..\$#+)})"; # add the numbered matches } ... my $re = qr/(\w+)\C{baz}(?: - (\w+)\C{qux})?(\+\d+)/; "three - four - five+89" =~ $re; print "baz=$~{baz}, qux=$~{qux}, $~{3}\n";
      Please note that even the named matches got their number! Maybe they should not, I think I could implement that if I needed.

      I also considered syntax like this:

      my $re = qr/(?\$bar=\w+) - (?\$qux{not}=\w+)/;
      which could naively be implemented like this:
      ... sub convert { my $re = shift; $re =~ s<\(\?\\\$([^=]+)=([^)]*)\)><($2)(?{\$$1=\$^N})>g; $re } ...
      but the problem is that I don't know how to make sure you can do even things like :
      my $re = qr/...(?\$var=a(\d+|\w-\w+)b).../;
      I don't know how to find the right closing bracket.

      We'd like to help you learn to help yourself
      Look around you, all you see are sympathetic eyes
      Stroll around the grounds until you feel at home
         -- P. Simon in Mrs. Robinson

        • %~ is not available for your use. Punctuation variables are reserved for perl's use. The ^_ namespace is reserved for this use. The closest available analogue of %~ is %{'^_~'} because %^_~ is a syntax error. I'd suggest %^_C to follow the \C{name} theme.

        • Hashes are cleared by assigning an empty list, not by undefining them. When you say %hash = () you allow perl to be smart about the allocation of the memory associated with %hash. undef %hash circumvents this and forces some unnecessary work.

        • I deliberately placed the new syntax to the right of the capture because otherwise I would have had to do some balanced delimiter matching. perlop covers the requirements for matching (...) in regexps. It is possible, I just couldn't do it in the two minutes it took to write the initial example.

          The implication of allowing $~{EXPR} to inform the creation of the hash key is that you must allow arbitrary perl code inside EXPR. This is not a problem if you take into account the same balanced-tag handling already mentioned in perlop.

          To do this really well requires Text::Balanced and an understanding of Gory details of parsing quoted constructs from perlop.

      Well that's fine. It could clobber %Regexp::NamedCapture::Captures because regexp results are already globals.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://396371]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2017-03-23 20:13 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (294 votes). Check out past polls.