http://www.perlmonks.org?node_id=986098


in reply to Re^2: Regex Greed
in thread Regex Greed

Unfortunately, the referenced section does not discuss the zero-width-lookahead-to-a-capture trick of jwkrahn's solution. Does anyone know where this is covered in the standard docs (as opposed to a PerlMonks node)?

Replies are listed 'Best First'.
Re^4: Regex Greed
by Athanasius (Archbishop) on Aug 08, 2012 at 02:42 UTC

    A search on “overlapping matches’ in perldoc doesn’t turn up anything relevant. However, I did find the following in the Camel Book (4th Edition, pages 247–8, underlining added):

    Lookahead assertions can be used to implement overlapping matches. For example,
    "0123456789" =~ /(\d{3})/g
    returns only three strings: 012, 345, and 678. By wrapping the capture group with a lookahead assertion:
    "0123456789" =~ /(?=(\d{3}))/g
    you now retrieve all of 012, 123, 234, 345, 456, 567, 678, and 789. This works because this tricky assertion does a stealthy sneakahead to run up and grab what’s there and stuff its capture group with it, but being a lookahead, it reneges and doesn’t technically consume any of it. When the engine sees that it should try again because of the /g, it steps one character past where last it tried.

    HTH,

    Athanasius <°(((><contra mundum

Re^4: Regex Greed
by ig (Vicar) on Aug 08, 2012 at 03:16 UTC