Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Regex Greed

by jwkrahn (Monsignor)
on Aug 07, 2012 at 21:12 UTC ( #986087=note: print w/replies, xml ) Need Help??

in reply to Regex Greed

$ perl -e' my $test = "xTx\nxxTxxT"; my $rx = qr/(?=(x...T))/s; my @matches = $test =~ /$rx/g; print "match #", $_ + 1, ":\n$matches[$_]\n" for 0 .. $#matches; ' match #1: x xxT match #2: xTxxT

Replies are listed 'Best First'.
Re^2: Regex Greed
by kennethk (Abbot) on Aug 07, 2012 at 21:41 UTC
    To add a little detail, jwkrahn is using a look ahead so the actual match itself is zero-width. See Looking ahead and looking behind in perlretut.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Unfortunately, the referenced section does not discuss the zero-width-lookahead-to-a-capture trick of jwkrahn's solution. Does anyone know where this is covered in the standard docs (as opposed to a PerlMonks node)?

        A search on “overlapping matches’ in perldoc doesn’t turn up anything relevant. However, I did find the following in the Camel Book (4th Edition, pages 247–8, underlining added):

        Lookahead assertions can be used to implement overlapping matches. For example,
        "0123456789" =~ /(\d{3})/g
        returns only three strings: 012, 345, and 678. By wrapping the capture group with a lookahead assertion:
        "0123456789" =~ /(?=(\d{3}))/g
        you now retrieve all of 012, 123, 234, 345, 456, 567, 678, and 789. This works because this tricky assertion does a stealthy sneakahead to run up and grab what’s there and stuff its capture group with it, but being a lookahead, it reneges and doesn’t technically consume any of it. When the engine sees that it should try again because of the /g, it steps one character past where last it tried.


        Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986087]
[1nickt]: hippo lately I can tell just by the fact that Anonymous posts a reply after some hours that it is more than likely That Monk.
[virtualsue]: weird. my p6 simple web spider fails to extract anchor tags from
[holli]: Nice. Turns out I don't have to remove Sweet home Alabama from my playlist ;)
[Corion]: virtualsue: We have some links with a newline between <a and the attributes, maybe your extractor fails to handle that?

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2017-12-13 11:52 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (360 votes). Check out past polls.