Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Regexp not capturing in named subrules

by diotalevi (Canon)
on Sep 16, 2009 at 00:31 UTC ( #795494=perlquestion: print w/ replies, xml ) Need Help??
diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I think my grammar's thing rule should have captured $+{thing}. It didn't. What did I miss?

use Test::More tests => 1; use Data::Dumper; 'cow' =~ / (?: # Grammar rules go here... (?<thing> .+ ) ){0} # Invoke grammar here ^(?&thing) /x or die "Didn't match"; my $got = Dumper({ '%+' => {%+}, '@+' => [@+], '@-' => [@-], }); my $expected = Dumper({ '%+' => { thing => 'cow' }, '@-' => [ 0, 0 ], '@+' => [ 3, 3 ], }); is( $got, $expected );

Test results:

# Failed test at bin/ooga.pl line 24. # got: '$VAR1 = { # '%+' => {}, # '@-' => [ # '0' # ], # '@+' => [ # '1', # undef # ] # }; # ' # expected: '$VAR1 = { # '%+' => { # 'thing' => 'cow' # }, # '@-' => [ # 0, # 0 # ], # '@+' => [ # 3, # 3 # ] # }; # '

Comment on Regexp not capturing in named subrules
Select or Download Code
Re: Regexp not capturing in named subrules (leftmost)
by tye (Cardinal) on Sep 16, 2009 at 02:50 UTC

    perlre says:

    If multiple distinct capture buffers have the same name then the $+{NAME} will refer to the leftmost defined buffer in the match.

    - tye        

      However, what puzzles me and should be apparent is that nothing at all was captured.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        Since the "leftmost defined buffer" is inside of a group quantified by {0}, it isn't puzzling to me that this buffer didn't match. And it isn't surprising that a buffer that didn't match doesn't capture anything.

        Perhaps you were now expecting thing => undef, contrary to your previously stated expectation? If so, my limited testing shows that %+ keys aren't pre-populated but the values are returned by magic (%+ always appears empty even when $+{moo} returns "o" after 'cow' =~ /(?<moo>o+)/).

        Update: I'm glad to report that the lack of keys in %+ appears to have only been an effect of the quick-hack method I used to test, related to evaluation order problems.

        - tye        

Re: Regexp not capturing in named subrules
by merlyn (Sage) on Sep 16, 2009 at 15:07 UTC
    ( ... ){0}
    Huh?

    First, {0} should be outright illegal, because it tends to be used (without effect) by newbies who say "I want this section to contain no K's, so I'll put K{0}".

    Second, even if it is legal, as it appears to be, I'd expect any decent optimizer to just completely eliminate it.

    So I'm surprised you expected any sort of utility out of your regex.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

      I guess I was just surprised to find this something that's more possible in Ruby-1.9. I'll still probably end up just translating to a proper parser but it was easy to use the regexp engine to start with. The below snippet is equivalent to my perl but does return the capture.

      require 'pp' re = %r{ # Grammar rules go here (?: (?<thing>.+) ){0} # Invoke grammar here \g<thing> } m = re.match( 'text' ) puts m['thing'] # puts "text\n"
        (?<thing>.+) never matched "text" (backtracking occurred), so it's clearly a major bug in ruby if your code showed it did.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://795494]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2014-12-20 18:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls