Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

regex for multiple capture within boundary

by xafwodahs (Scribe)
on Jul 14, 2006 at 19:36 UTC ( #561309=perlquestion: print w/replies, xml ) Need Help??

xafwodahs has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Let's say I have
$x = "a1aa11 b2bb22bbb222 c3cc33";
and I want a single regex that grabs all the number characters between the spaces. So, in other words, I want an array like [2, 22, 222] from:.
(@nums) = $x =~ /<something here>/<modifiers here>;
What would your wisdom suggest?

Replies are listed 'Best First'.
Re: regex for multiple capture within boundary
by ikegami (Patriarch) on Jul 14, 2006 at 19:57 UTC

    Update: Ah! Now I understand! How did I miss that?

    # @nums = ('2', '22', '222'); my @nums = ($x =~ /\s(\S+)/)[0] =~ /(\d+)/g;

    or

    # @nums = ('2', '22', '222'); my @nums = (split(' ', $x, 3))[1] =~ /(\d+)/g;
      I think I understand what is going on in the first of your updated solutions but if I change it to read

      my @nums = ($x =~ /\s(\S+)/)[1] =~ /(\d+)/g;

      expecting output of

      3 33

      it doesn't work unless I also make the first match global like this

      my @nums = ($x =~ /\s(\S+)/g)[1] =~ /(\d+)/g;

      I think this is because the round brackets around the match put the match into list context and the [0] subscript grabs the first elements of the match; however, since the match is non-global there will only ever be one element in the list and trying to get more will not work. If we want a second or subsequent element we must make the match global to capture more than one element.

      Have I understood this correctly or am I completely missing the point?

      Cheers,

      JohnGG

        That's exactly it (although there could be 0 elements if the match fails).

        my @nums = ($x =~ /\s(\S+)/)[0] =~ /(\d+)/g;
        could also be written as
        my @nums = ($x =~ /\s(\S+)/ ? $1 : undef) =~ /(\d+)/g;

        If you're going to use /g, drop the \s:

        # @nums = ('3', '33'); my $word = 2; my @nums = ($x =~ /(\S+)/g)[$word] =~ /(\d+)/g;

        or use split:

        # @nums = ('3', '33'); my $word = 2; my @nums = (split(' ', $x))[$word] =~ /(\d+)/g;
Re: regex for multiple capture within boundary
by Ieronim (Friar) on Jul 14, 2006 at 20:05 UTC
    @nums = $x =~ m/(\d+)/g;
    UPD: I just did not understand the OP correctly :)
      This is pulling out all the digits. The OP is just asking for the "digits between the spaces".
      my $x = "a1aa11 b2bb22bbb222 c3cc33"; my @nums = $x =~ /\d+/g; print join(" ", @nums) . "\n"; Output: 1 11 2 22 222 3 33
      Ikegami's answer is pulling out the numbers between the spaces.
Re: regex for multiple capture within boundary
by jwkrahn (Monsignor) on Jul 14, 2006 at 20:16 UTC
    The simple answer is:
    my @nums = $x =~ /\d+/g;
    Update: Okay, I reread the problem. :-)
    $ perl -le'$x = "a1aa11 b2bb22bbb222 c3cc33"; print for ( $x =~ /\s(\S ++)\s/ )[ 0 ] =~ /\d+/g' 2 22 222
Re: regex for multiple capture within boundary
by Sidhekin (Priest) on Jul 14, 2006 at 21:58 UTC
    I want a single regex that grabs all the number characters between the spaces

    Extreme regexing? I cannot resist such a question. This ought to do it:

    @nums = $x =~ /(?:^\S*\s+|\G)[^\s\d]*(\d+)/g;

    However, if it does not absolutely have to be a single regex, I suspect the maintainer will prefer one of ikegami's solutions. :-)

    Update: I've been out-extremed! Not perhaps by the first of ikegami's latter solutions, which after all is straight-forward, and probably as maintainable as mine, but certainly by the second. It took me three attempts just to read it: My eyes glazed over twice!

    ... what's with the local *nums; though? Yet another update: Ah. Considerate of you. :-)

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      I hope you're refering to one of these and not one of the following ;)
      local *nums; our @nums; $x =~ / \s (?: [^\s\d]* (\d+) (?{ push(@nums, $1) }) )* /x;

      or

      local *nums; our @nums; $x =~ / ^ (?> \S* \s ) \S*? (?<!\d) ( (?> \d+ ) ) (?{ push(@nums, $1) }) (?!) /x;
      ... what's with the local *nums; though?

      I used package variables because regexps capture. Putting the regexp in a sub would only work once if I had used lexical variables instead of pacakge variables.

      our @nums; makes it so I can say @nums instead of @main::nums to refer to the package variable.

      local *nums works better than local @nums;. They both ensure that @main::nums has the same value when we're done as it did when we started. In other words, it makes sure we're not trampling over someone else's variables.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://561309]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2022-09-25 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (116 votes). Check out past polls.

    Notices?