Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Is there a default array for reg exp memory variables?

by korpenkraxar (Sexton)
on Feb 05, 2012 at 22:16 UTC ( [id://951996]=perlquestion: print w/replies, xml ) Need Help??

korpenkraxar has asked for the wisdom of the Perl Monks concerning the following question:

Dear wise bringers of enlightenment

Reg exp patterns stored in variables can specify arbitrary numbers of matching substrings and create a dynamic number of memory variables. Is there a default array that contains $1, $2, $3 ... that we can return from a reg exp rather than specifying the individual memory variables? I have searched the interwebs but uncovered nil indicating its existence :-(

Consider this case:

#!/usr/bin/perl -w use strict; use warnings; use v5.010; my $string = "AA=1,BB=2,CC=3,DD=4,EE=5"; my $pattern = "AA\=(\\d).+\,EE\=(\\d)"; # Use subroutine my @first = get_vars( $string , $pattern ); say "1: @first"; $pattern = "AA\=(\\d).+\,CC\=(\\d).+\,EE\=(\\d)"; my @second = get_vars( $string , $pattern ); say "2: @second"; # Use list context $pattern = "AA\=(\\d).+\,EE\=(\\d)"; my @third = ( $string =~ m/$pattern/ ); say "3: @third"; $pattern = "AA\=(\\d).+\,CC\=(\\d).+\,EE\=(\\d)"; my @fourth = ( $string =~ m/$pattern/ ); say "4: @fourth"; sub get_vars { my ( $string , $pattern ) = @_; if ( $string =~ m/$pattern/ ) { return ( $1 , $2 , $3 , $4 , $5 ); } }

In the first two cases I resorted to hard-coding the number of memory variables and setting it to some sort of theoretical maximum to cover all expected cases, which leads to the creation of uninitialized elements in the array since the pattern is short.

In the last two cases I do the matching in list context which works very neatly. Perl is obviously smart enough to build an array for us in this case. Can we get hold of that array explicitly somehow when we are not in list context?

Replies are listed 'Best First'.
Re: Is there a default array for reg exp memory variables?
by tobyink (Canon) on Feb 05, 2012 at 22:35 UTC

    The return value of the "=~" operator (if called in list context) is the array you desire.

    use Data::Dumper; if (my @r = "foo bar baz" =~ /(foo) (bar) (baz)/) { print Dumper \@r } __END__ $VAR1 = [ 'foo', 'bar', 'baz' ];

    You say:

    Can we get hold of that array explicitly somehow when we are not in list context?

    ... so clearly you already know the above. Why not just use list context? There is rarely any reason to explicitly avoid it when regexp matching. Please explain what you're trying to actually do, and why performing matches in list context is insufficient.

    Depending on what you're trying to do, you could consider using named captures, which get stored into the hash %-. If you name the captures appropriately, you could assemble them into an array...

    sub get_caps () { my @caps; my $i = 1; while (exists $-{'cap'.$i}) { push @caps, $-{'cap'.$i++}->[0]; } @caps; } use Data::Dumper; if (scalar("foo bar baz" =~ /(?<cap1>foo) (?<cap2>bar) (?<cap3>baz)/)) { my @r = get_caps; print Dumper \@r; } __END__ $VAR1 = [ 'foo', 'bar', 'baz' ];

    Though obviously that involves modifying the regular expression itself to add the named captures. And it needs a non-archaic version of Perl (at least 5.10).

      I located the concept named capture buffers five minutes ago, then I go over here and see that you have already posted code using it and Eliya has also provided great feedback. Thanks! Virtual beer for both of you!

      I just haven't seen reg exps in list context in enough examples or code to actually realize its power until I stumbled across it in Effective Perl Programming. I agree it is the way to go but I still think we could have use for a named default array. Is there a reason for not having it in there?

        but I still think we could have use for a named default array. Is there a reason for not having it in there?

        One reason: If it existed, the safest thing to do would be to copy it into your own array to ensure that the data it contained didn't get overwritten by a subsequent regex invocation before you were finished with it.

        What would be the point of the global default array if the only thing you could safely do with it is copy it to somewhere else, when you can assign the results directly to that other place?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Is there a default array for reg exp memory variables?
by Eliya (Vicar) on Feb 05, 2012 at 22:28 UTC

    You can create it yourself:

    ... if ( my @captures = $string =~ m/$pattern/ ) { return @captures; } else { return; }

    (The else { return } is required here to match the behavior of your original code. Without it, the "last evaluated expression" returned in case the regex does not match would be zero (unlike in your code). This is because the then empty @captures array evaluates to zero in the scalar context of the if.)

Re: Is there a default array for reg exp memory variables?
by AnomalousMonk (Archbishop) on Feb 05, 2012 at 23:53 UTC

    korpenkraxar: As others have pointed out and as you have already acknowledged, assigning captures to your own array or using named captures is probably the way to go, especially since you already know the capture groups  $1 $2 $3 $n that are present in the regex because you wrote the regex yourself!

    In case you are dealing with a foreign regex, here's a trick to let you know the highest capture group present in the regex based on the  @- special variable (see perlvar).     (Update: Note: The presence of a capture group in a regex does not mean it captured anything meaningful or that it matched at all.)

    >perl -wMstrict -le "my $s = 'Fu Feet Fie Foe Fum'; my $f = qr{ F \w* }xms; for my $rx ( qr{ (X) }xms, qr{ ($f) \s* ($f) }xms, qr{ ($f) \s* ($f) \s* ($f) }xms, qr{ ($f) \s* ($f) \s* ($f) \s* ($f) }xms, ) { $s =~ $rx; print qq{highest capture group is $#-, captures are}; printf qq{\$$_->[0] eq '$_->[1]' } for map [ $_, eval qq{\$$_} ], 1 .. $#-; print qq{\n}; } " highest capture group is -1, captures are highest capture group is 2, captures are $1 eq 'Fu' $2 eq 'Feet' highest capture group is 3, captures are $1 eq 'Fu' $2 eq 'Feet' $3 eq 'Fie' highest capture group is 4, captures are $1 eq 'Fu' $2 eq 'Feet' $3 eq 'Fie' $4 eq 'Foe'

    Update: Changed example code: added no-match case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://951996]
Approved by Eliya
Front-paged by chrestomanci
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-20 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found