Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Capturing string matched by regex

by tunafish (Beadle)
on Feb 17, 2012 at 03:33 UTC ( #954377=perlquestion: print w/replies, xml ) Need Help??
tunafish has asked for the wisdom of the Perl Monks concerning the following question:

Wondering if there is a way to capture only the part of a string matched by regex? If I have the following string:

$string_to_search="This part of the string is what I'm looking for and + I want to capture it. I want to capture all instances of this paart +of the string. Paaaaart of the string!";

How can I capture all strings within that string that match the pattern /pa+rt/i? So what I want for my return is:

@results = ('part','paart','Paaaaart');

Replies are listed 'Best First'.
Re: Capturing string matched by regex
by dave_the_m (Prior) on Feb 17, 2012 at 03:38 UTC
    @results = $string_to_search =~ /(pa+rt)/ig;

    Dave.

      I feel a little dumb for not figuring that out on my own :)

      Thanks Dave and AnomalousMonk (the parentheses were indeed not needed).

Re: Capturing string matched by regex
by AnomalousMonk (Chancellor) on Feb 17, 2012 at 04:20 UTC

    ... and you don't even need capturing parentheses:

    >perl -wMstrict -le "my $string_to_search = 'This part of all of this paart of the string. Paaaaart!'; ;; my $pattern = qr{ (?i) pa+rt }xms; my @results = $string_to_search =~ /$pattern/g; printf qq{'$_' } for @results; " 'part' 'paart' 'Paaaaart'
      my $pattern = qr{ (?i) pa+rt }xms;

      Leaving aside the merits or demerits of deploying 'x' and 'm' here, I'm just wondering why you have put one regex modifier inside the qr{ ... } and the other three outside. It would seem more consistent to do either

      my $pattern = qr{ pa+rt }xims;

      or

      my $pattern = qr{(?xims) pa+rt };

      Just a little puzzled :-s

      Cheers,

      JohnGG

        ... more consistent ...

        I haven't gone back to review in detail the rationale presented in Perl Best Practices (PBP), but off the top of my head...

        Of course, the reason for   Update: the PBP recommendation of   the unvarying use of the  /xms regex modifier 'tail' (if that's the proper term) is to give the  ^ $ . regex operators unvarying behaviors, and the programmer a few fewer things to worry about; because they're always there, their proper place is in the tail.

        One thing that cannot be made invariant from regex to regex is case insensitivity. Where, then, to put the  /i modifier? If in the modifier tail, it's in danger of being 'lost', and moreover has global effect upon the regex. If in the body of the regex, it's in your face, and has the added advantage of being more flexible: the effects of the  (?i) and  (?-i) extended pattern modifiers are dependent upon the 'scoping' of the regex capturing and non-capturing groups in which they may appear   (Update: see docs linked below for details).

        I.e., the mixture of  qr{pat}xms with  qr{pat}xmsi (or m// or s///) regex definitions is actually less consistent! Moreover, the  (?i) extended pattern allows one to precisely define and control the desired matching behavior.

        Of course, the PBP recommendations are not without controversy. I will only repeat the words of a great Marxist philosopher (Groucho): "These are my principles. If you don't like them, I've got others."

        See Extended Patterns in perlre for detailed info on the behavior of  "(?pimsx-imsx)" and  "(?imsx-imsx:pattern)" patterns, especially on the 'scope' of their effect.

        Updates:

        1. Added link to docs.
        2. Qualified 2nd paragraph text per JohnGG.

      use v5.14; use re "/sixm"; $_ = "This part is past all of this paaRTy of the string. Paaaaarty on +!"; my @matches = (); sub show_matches { printf "Got %d matches:\n", scalar @matches; my $i = 1; for (@matches) { printf " %2d\t$_\n", $i++, $_; } } @matches = / pa .* rt /g; show_matches(); @matches = / pa .*? rt /g; show_matches(); @matches = / (?= (pa .* rt) ) /g; show_matches(); @matches = (); () = / (pa .* rt) (?{push @matches, $^N}) (*FAIL) /g; show_matches();
      Says:
      Got 1 matches: 1 part is past all of this paaRTy of the string. Paaaaart Got 3 matches: 1 part 2 past all of this paaRT 3 Paaaaart Got 4 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 past all of this paaRTy of the string. Paaaaart 3 paaRTy of the string. Paaaaart 4 Paaaaart Got 8 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 part is past all of this paaRT 3 part 4 past all of this paaRTy of the string. Paaaaart 5 past all of this paaRT 6 paaRTy of the string. Paaaaart 7 paaRT 8 Paaaaart

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://954377]
Approved by NetWallah
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2018-04-26 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?