Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: regex capture and quantifiers

by Laurent_R (Abbot)
on Apr 30, 2013 at 21:17 UTC ( #1031477=note: print w/replies, xml ) Need Help??

in reply to regex capture and quantifiers

Please provide the output that you got and the output that you expected. This way, I do not have to run tour program and try to figure out why what you get is not what you wanted.

I think that your problem has probably to do the greediness of the * or + quantifiers in matches: they try to match as much as possible.

Sometimes, the earlier part of your match does match much more that you expect from the string and end up with the wrong capture.

For example, suppose that I want to match the second word of this sentence: "The quick brown fox jumps over the lazy dog." If I use this regep: /.+ (\w+) /, I might think that the early part of the regexp will "eat" the first word until the space and that the (\w+) will capture "quick". In fact, the '.+ ' will match as much as possible to still make the '(\w+) ' match something. So that the first part will match "The quick brown fox jumps over the " and that the (\w+) will match "lazy" as it can be seen in the follwing session under the Perl debugger:

DB<5> $c = "The quick brown fox jumps over the lazy dog."; DB<6> print $1 if $c =~ /.+ (\w+) /; lazy DB<7>

To prevent this, you have to use either the non-greedy quantifiers (+? and *?) or to be more specific in your regexp. For example, the following regexp will all match the second word as expected:

DB<8> print $1 if $c =~ /[^ ]+ (\w+) /; quick DB<9> print $1 if $c =~ /\w+ (\w+) /; quick DB<10> print $1 if $c =~ /\S+ (\w+) /; quick DB<11> print $1 if $c =~ /.+? (\w+) /; quick DB<21>

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031477]
[GotToBTru]: late to the conversation .. but thanks, Corion, for the reminder about Perl::Tidy
[Corion]: GotToBTru: Heh ;) I'm not really endorsing it, as I found that any gains made through it are easily squandered by the hours spent on configuring it. At least in my case :)
[GotToBTru]: Like most Swiss Army Knives, you can cut yourself trying to find the blade you actually want
[GotToBTru]: i am satisfied with the default settings turning the seemingly random formatting into something reliable
[Corion]: GotToBTru: Yeah, I started a quick(ly implemented but slowly running) implementation of a program that would guess the "best" configuration (that is, least amount of changes) from my code, but then found that it stops parsing the source as soon as it ..
[GotToBTru]: sadly, it can't do anything about this
[Corion]: ... encounters sub foo($bar,$baz) {, which is something I like nowadays
[Corion]: GotToBTru: Yeah, that stuff is hard to do away automatically ;)

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2017-02-27 14:30 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (387 votes). Check out past polls.