Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Regexp not capturing in named subrules

by diotalevi (Canon)
on Sep 16, 2009 at 00:31 UTC ( #795494=perlquestion: print w/replies, xml ) Need Help??
diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I think my grammar's thing rule should have captured $+{thing}. It didn't. What did I miss?

use Test::More tests => 1; use Data::Dumper; 'cow' =~ / (?: # Grammar rules go here... (?<thing> .+ ) ){0} # Invoke grammar here ^(?&thing) /x or die "Didn't match"; my $got = Dumper({ '%+' => {%+}, '@+' => [@+], '@-' => [@-], }); my $expected = Dumper({ '%+' => { thing => 'cow' }, '@-' => [ 0, 0 ], '@+' => [ 3, 3 ], }); is( $got, $expected );

Test results:

# Failed test at bin/ line 24. # got: '$VAR1 = { # '%+' => {}, # '@-' => [ # '0' # ], # '@+' => [ # '1', # undef # ] # }; # ' # expected: '$VAR1 = { # '%+' => { # 'thing' => 'cow' # }, # '@-' => [ # 0, # 0 # ], # '@+' => [ # 3, # 3 # ] # }; # '

Replies are listed 'Best First'.
Re: Regexp not capturing in named subrules (leftmost)
by tye (Sage) on Sep 16, 2009 at 02:50 UTC

    perlre says:

    If multiple distinct capture buffers have the same name then the $+{NAME} will refer to the leftmost defined buffer in the match.

    - tye        

      However, what puzzles me and should be apparent is that nothing at all was captured.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        Since the "leftmost defined buffer" is inside of a group quantified by {0}, it isn't puzzling to me that this buffer didn't match. And it isn't surprising that a buffer that didn't match doesn't capture anything.

        Perhaps you were now expecting thing => undef, contrary to your previously stated expectation? If so, my limited testing shows that %+ keys aren't pre-populated but the values are returned by magic (%+ always appears empty even when $+{moo} returns "o" after 'cow' =~ /(?<moo>o+)/).

        Update: I'm glad to report that the lack of keys in %+ appears to have only been an effect of the quick-hack method I used to test, related to evaluation order problems.

        - tye        

Re: Regexp not capturing in named subrules
by merlyn (Sage) on Sep 16, 2009 at 15:07 UTC
    ( ... ){0}

    First, {0} should be outright illegal, because it tends to be used (without effect) by newbies who say "I want this section to contain no K's, so I'll put K{0}".

    Second, even if it is legal, as it appears to be, I'd expect any decent optimizer to just completely eliminate it.

    So I'm surprised you expected any sort of utility out of your regex.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

      I guess I was just surprised to find this something that's more possible in Ruby-1.9. I'll still probably end up just translating to a proper parser but it was easy to use the regexp engine to start with. The below snippet is equivalent to my perl but does return the capture.

      require 'pp' re = %r{ # Grammar rules go here (?: (?<thing>.+) ){0} # Invoke grammar here \g<thing> } m = re.match( 'text' ) puts m['thing'] # puts "text\n"
        (?<thing>.+) never matched "text" (backtracking occurred), so it's clearly a major bug in ruby if your code showed it did.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://795494]
Approved by toolic
[Corion]: I think I'm overdesigning things again. I want to export(later, synchronize) data from Google Keep, by scraping the HTML. And I'm thinking of automating this by having a canary note whose text my program knows and from which it can determine the ...
[Corion]: ... surrounding HTML to scrape all the other notes. Maybe I should better look at dumping all the requests that pass between Google and my "browser" instead.

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2017-12-12 08:53 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (327 votes). Check out past polls.