Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
"be consistent"
 
PerlMonks  

Re: RFC: named pattern match tokens

by diotalevi (Canon)
on Oct 04, 2004 at 20:04 UTC ( #396371=note: print w/ replies, xml ) Need Help??


in reply to RFC: named pattern match tokens

You might also like to look at extending the regexp syntax. This adds the new ( ... capture ... )\C{name} element to regular expression syntax. It copies the contents of the last closed capture into the scalar variable named 'name'. So /( [\dA-F]+ ) \C{ hex }/x would copy a hex string to the $hex variable.

use Regexp::NamedCaptures; $_ = "three - four - five"; /(\w+)\C{baz} - (\w+)\C{qux}/g; print "baz=$baz, qux=$qux\n";

Regexp::NamedCaptures
Updated: Changed the \N{ ... } to \C{ ... } to not conflict with named characters.
Also changed the return value of convert() so it returns the altered expression instead of the boolean result of the s///.

package Regexp::NamedCaptures; use overload; sub import { shift; die "No argument allowed to " . __PACKAGE__ . "::import" if @_; overload::constant qr => \ &convert; } sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$$2=\$^N})" : "\\" }xeg; $re; } 1;


Comment on Re: RFC: named pattern match tokens
Select or Download Code
Re^2: RFC: named pattern match tokens
by revdiablo (Prior) on Oct 04, 2004 at 21:53 UTC
    You might also like to look at extending the regexp syntax

    I thought about this, but didn't have any experience doing so. I just went with what I know, but I may take a look at your code, and see how it works.

    It copies the contents of the last closed capture into the scalar variable named 'name'

    I'm not sure I like this part. The idea of extending regular expression syntax is nice, but storing the matches in arbitrary scalars seems a bit sloppy. Maybe this can be reworked to store the results in a hash.

    Something along the lines of:

    use re 'eval'; use strict; my $re = convert('(foo)\C{ foo }'); my %hash; "foo bar" =~ $re; print $hash{foo}, "\n"; sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$hash{$2}=\$^N})" : "\\" }xeg; $re; }

    This is only marginally better, though, because instead of clobbering any arbitrary number of scalar variables, it clobbers one hash. Maybe there's a cleaner way to handle this.

      Well that's fine. It could clobber %Regexp::NamedCapture::Captures because regexp results are already globals.

      I thought the same. I think a nice name of the hash would be %~. =~ is matching so why couldn't $~{name} be a named match. Here is the code I ended up with:

      ... sub convert { my $re = shift; $re =~ s( \\ ( \\ | C\{ (?>\s*) ((?>\w+)) (?>\s*) \} ) ) { defined $2 ? "(?{\$~{$2}=\$^N})" : "\\" }xeg; "(?{undef(%~)})" # clear the %~ .$re ."(?{\$~{\$_}=\${\$_} for(1..\$#+)})"; # add the numbered matches } ... my $re = qr/(\w+)\C{baz}(?: - (\w+)\C{qux})?(\+\d+)/; "three - four - five+89" =~ $re; print "baz=$~{baz}, qux=$~{qux}, $~{3}\n";
      Please note that even the named matches got their number! Maybe they should not, I think I could implement that if I needed.

      I also considered syntax like this:

      my $re = qr/(?\$bar=\w+) - (?\$qux{not}=\w+)/;
      which could naively be implemented like this:
      ... sub convert { my $re = shift; $re =~ s<\(\?\\\$([^=]+)=([^)]*)\)><($2)(?{\$$1=\$^N})>g; $re } ...
      but the problem is that I don't know how to make sure you can do even things like :
      my $re = qr/...(?\$var=a(\d+|\w-\w+)b).../;
      I don't know how to find the right closing bracket.

      Jenda
      We'd like to help you learn to help yourself
      Look around you, all you see are sympathetic eyes
      Stroll around the grounds until you feel at home
         -- P. Simon in Mrs. Robinson

        • %~ is not available for your use. Punctuation variables are reserved for perl's use. The ^_ namespace is reserved for this use. The closest available analogue of %~ is %{'^_~'} because %^_~ is a syntax error. I'd suggest %^_C to follow the \C{name} theme.

        • Hashes are cleared by assigning an empty list, not by undefining them. When you say %hash = () you allow perl to be smart about the allocation of the memory associated with %hash. undef %hash circumvents this and forces some unnecessary work.

        • I deliberately placed the new syntax to the right of the capture because otherwise I would have had to do some balanced delimiter matching. perlop covers the requirements for matching (...) in regexps. It is possible, I just couldn't do it in the two minutes it took to write the initial example.

          The implication of allowing $~{EXPR} to inform the creation of the hash key is that you must allow arbitrary perl code inside EXPR. This is not a problem if you take into account the same balanced-tag handling already mentioned in perlop.

          To do this really well requires Text::Balanced and an understanding of Gory details of parsing quoted constructs from perlop.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://396371]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (12)
As of 2014-04-17 19:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (454 votes), past polls