Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: regex capture and quantifiers

by Laurent_R (Vicar)
on Apr 30, 2013 at 21:17 UTC ( #1031477=note: print w/ replies, xml ) Need Help??


in reply to regex capture and quantifiers

Please provide the output that you got and the output that you expected. This way, I do not have to run tour program and try to figure out why what you get is not what you wanted.

I think that your problem has probably to do the greediness of the * or + quantifiers in matches: they try to match as much as possible.

Sometimes, the earlier part of your match does match much more that you expect from the string and end up with the wrong capture.

For example, suppose that I want to match the second word of this sentence: "The quick brown fox jumps over the lazy dog." If I use this regep: /.+ (\w+) /, I might think that the early part of the regexp will "eat" the first word until the space and that the (\w+) will capture "quick". In fact, the '.+ ' will match as much as possible to still make the '(\w+) ' match something. So that the first part will match "The quick brown fox jumps over the " and that the (\w+) will match "lazy" as it can be seen in the follwing session under the Perl debugger:

DB<5> $c = "The quick brown fox jumps over the lazy dog."; DB<6> print $1 if $c =~ /.+ (\w+) /; lazy DB<7>

To prevent this, you have to use either the non-greedy quantifiers (+? and *?) or to be more specific in your regexp. For example, the following regexp will all match the second word as expected:

DB<8> print $1 if $c =~ /[^ ]+ (\w+) /; quick DB<9> print $1 if $c =~ /\w+ (\w+) /; quick DB<10> print $1 if $c =~ /\S+ (\w+) /; quick DB<11> print $1 if $c =~ /.+? (\w+) /; quick DB<21>


Comment on Re: regex capture and quantifiers
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031477]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (17)
As of 2014-07-29 13:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (217 votes), past polls