Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Capture groups

by legend (Sexton)
on Mar 04, 2008 at 23:36 UTC ( #672035=perlquestion: print w/replies, xml ) Need Help??
legend has asked for the wisdom of the Perl Monks concerning the following question:

I have some text like:
$text = "Grand Canyon 70%";
I want to capture the words and the number into two different strings like say:
$match[0] = "Grand Canyon"; $match[1] = "70%";
I was trying:
$text ~~ /([A-Za-z\s]*)([0-9]{1,2})[%])/;
But I am getting a bunch of compilation errors complaining about the double tilde... Can someone help me out please?

Replies are listed 'Best First'.
Re: Capture groups
by brian_d_foy (Abbot) on Mar 05, 2008 at 00:51 UTC

    Are you trying to use Perl 5.10 and the smart match operator? Here's the solution written with named captures, the nifty new 5.10 features that mean you never have to think about $1, $2, and so on. The names become keys in the new hash %+, and the matched text are the values:

    #!/usr/local/bin/perl5.10.0 use 5.010; my $text = "Grand Canyon 70%"; $text ~~ / (?<name> .*\S ) \s+ (?<percent> \d\d% ) /x; use Data::Dumper; print Dumper( \%+ )

    The output automatically labels your values:

    $VAR1 = { 'percent' => '70%', 'name' => 'Grand Canyon' };

    Here's the boring Perl 5.8 way:

    #!/usr/bin/perl my $text = "Grand Canyon 70%"; $text =~ / ( .*\S ) \s+ ( \d\d% ) /x; print <<"HERE"; 1: $1 2: $2 HERE
    brian d foy <>
    Subscribe to The Perl Review
Re: Capture groups
by Joost (Canon) on Mar 04, 2008 at 23:41 UTC

      The regex above will capture trailing spaces in the $1 variable ([A-Za-z\s]*). It's probably easier to just get rid of them after the match, with $match[0] =~ s/\s+$//; otherwise you'll need a fancier regex :-)

        IMO, it's "easier" to get rid of the trailing spaces (not required by OP, so Joost's correct answer is perfectly serviceable for the specs given) in the original match:

        if ( $string =~ /([A-Za-z]*\s[A-Za-z]*)\s+([0-9]{1,2}[%])/ ) { print $1 . "\n" . $2; # ^ } else ...
      The double tilde ~~ operator is new in perl 5.10, so it won't work in any earlier version.

      Damn. For a minute there I thought we'd run into an old awk programmer. But then, awk just uses a single tilde.

Re: Capture groups
by legend (Sexton) on Mar 05, 2008 at 00:23 UTC
    For some reason I'm using the regex but its not working:
    I just tested it in a regex tester and it was working fine but in perl it doesn't work...
      For some reason I'm using the regex but its not working:
      That is a pretty specific regex... will none of your text chunks be only 1 word? Also, the 2nd \s may cause a problem if the number is separated from the text by multiple tabs/spaces.
        Again (cf above) making \s match multiple whitespace -- tabs or spaces, to judge from OP - is easily handled with a "+" after the second \s.

        The point about the possibility of single word "text chunks" is well made... so long as the words "pretty specific regex" are not intended to deprecate specificity.

        IMO, specificity is *GOOD* in a regex unless ambiguity (or at least, specific generalizations) are required because a less-than-specific regex can lead to hard-to-find problems where the source data includes unexpected content.


        H2O     60%
        Grand Canyon3     70%
        Teller-Bose condensate     50%

        Does one want the "water" entry or the footnoted "Grand Canyon" in the output?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://672035]
Approved by Joost
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2018-05-25 19:43 GMT
Find Nodes?
    Voting Booth?