Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Quantified named captures in 5.10

by mscha (Acolyte)
on Dec 20, 2007 at 23:34 UTC ( #658327=perlquestion: print w/replies, xml ) Need Help??

mscha has asked for the wisdom of the Perl Monks concerning the following question:

I excitedly downloaded and compiled 5.10 yesterday, and tried out some of the new features. There's one, though, that did not work as I expected:
#!/opt/perl/bin/perl use strict; use warnings; use 5.010; my $str = 'Digits: 123689'; if ($str =~ m{Digits: \s* (?<digit>\d)+ }x) { say 'Digits: ', join(', ', @{$-{digit}}); }
This prints, I mean, says:
Digits: 9
So it appears that on a quantified named capture, only the last match is captured in %-. This seems very counterintuitive to me. Isn't the whole point of %- that it captures all matches?

Of course, I can do something like:
(?<digit>\d)? (?<digit>\d)? (?<digit>\d)? (?<digit>\d)? (?<digit>\d)? +(?<digit>\d)? (?<digit>\d)? (?<digit>\d)? (?<digit>\d)? (?<digit>\d)?
which works, but that's just plain silly... Is this a bug? Or simply not an implemented feature? Or is there a good reason why %- does not capture this?


- Michael

Replies are listed 'Best First'.
Re: Quantified named captures in 5.10
by webfiend (Vicar) on Dec 21, 2007 at 00:45 UTC

    I don't think it's the most graceful solution, but I was able to throw this together in a hurry. It does look like the official rule is that you need to explicitly specify each occurrence of digit that you care about. That doesn't really make sense to me either. Which probably means there is another approach we're missing.

    use feature qw(:5.10); use strict; use warnings; my $str = 'Digits: 123689'; if ($str =~ m{Digits: \s* (?<digits>\d+)}x) { my @digits = split //, $+{digits}; say "Digits: " , join(', ', @digits); } else { say "No match"; }

    Update: This version looks like a slightly better solution for a more general purpose chain, but I still like it better the way you're thinking of.

    use feature qw(:5.10); use strict; use warnings; my $str = 'Digits: 123689'; if ($str =~ m{Digits: \s* (\d+)}x) { my $digit_str = $1; my @digits; while ($digit_str =~ m{(?<digit>\d)}gx) { push @digits, $+{digit}; } say "Digits: " , join(', ', @digits); } else { say "No match"; }
Re: Quantified named captures in 5.10
by educated_foo (Vicar) on Dec 21, 2007 at 06:18 UTC
    See my response in Re: What's missing in Perl 5.10. Basically, you currently need to do something horrid like this:
    my $re = qr{ Digits: \s* (?:(?<digits> (?:(?<digit>\d) (?{ local @d = (@d, $+{digit}) }) )+ )(?{ $digits = \@d })) }x; 'hello, world, Digits: 123' =~ /$re/ and print "@$digits\n";
Re: Quantified named captures in 5.10
by halley (Prior) on Dec 21, 2007 at 13:56 UTC
    I guess I wouldn't expect the addition of named captures to change how captures work in regexes. Just because the capture variable has a name does not mean that the () or a+ semantics have changed.

    The pre-5.10 version looks like this (I've added a term to show the singularness of captures.

    my $str = 'Digits: 123689 abc'; if ($str =~ m{Digits: \s* (\d)+ \s+ (\w+) }x) { print 'Digits: ', $1, ' ', $2, $/; } __OUTPUT__ Digits: 9 abc
    With the numeric "names" for captures, it seems natural that there's only one slot. Adding a + to the capture does not add more capturing group instances to the system. There's one actual set of parentheses, so there's one actual capture.

    [ e d @ h a l l e y . c c ]

Re: Quantified named captures in 5.10
by mugwumpjism (Hermit) on Dec 21, 2007 at 06:22 UTC

    I don't think that %- is about changing the semantics of matching, but more for when you're repeating parts of a regular expression, probably using interpolated variables (ie, qr{ } sub-expressions).

    You should probably look into the /g regex modifier.

    $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";
Re: Quantified named captures in 5.10
by NetWallah (Canon) on Dec 22, 2007 at 04:08 UTC
    Based on what halley(++) said, there is no specific advantage that perl 5.10 provides to resolve this.

    I came up with this relatively short snippet that expresses what you want to do, and is generic enough to be extensible, yet legible (Uses 2-stages of RE extraction, and does not require you to pre-define the number of elements you expect to extract ):

    my $str = 'Digits: 123689'; print join ',', m/(\d)/g for $str =~ m{Digits: \s* (\d+) }x ; #prints: #1,2,3,6,8,9

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://658327]
Approved by Limbic~Region
Front-paged by Limbic~Region
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2019-08-23 16:21 GMT
Find Nodes?
    Voting Booth?

    No recent polls found