Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
For those for whom '\G' is deep into 'executable line noise' country:

The \G anchor forces the next match to start where the last match left off. Use \G analogously to ^ at the beginning of a string. ^ matches only the beginning of a string – \G matches only the beginning of the string when greedy matching has chewed off the front of the string.

perlfaq5 has more detail. (The internal hyperlink at perldoc.perl.org is broken – apparently the backslash discombobulated the escapeHTML routines. But this link will get you there.) The other piece of the puzzle is the '(?='. This handy expression—the 'zero width positive lookahead' (along with its evil twin '(?!') are explained in more detail at perlretut.

You may also want to review Non-capturing-groupings.

Let's take Sidhekin's piece of work apart, and not be quite so terse. As perlretut says

Long regexps like this may impress your friends, but can be hard to decipher. In complex situations like this, the //x modifier for a match is invaluable. It allows one to put nearly arbitrary whitespace and comments into a regexp without affecting their meaning. Using it, we can rewrite our 'extended' regexp in the more pleasing form
So using the x modifier, the heart of Sidhekin's code becomes
# We're hunting the (properly bracketed) $str =~ # 'm \d+' occurrences. They must be / (?:^foo\s # - proceeded by and initial foo | # OR (?<!^)\G) # - the end of a previous successful match # - but not the beginning of the string m \s (\d+) # Here's the guy we really want. # But he must be followed by the right stuff (?= # Lookahead says he must be followed by: (?:m \s \d+ \s)* # Any number of m \d+ groups. bar)/xg; # Finally terminated with bar (though not # necessarily the end of string.)
Notice—since whitespace is not significant when using the //x modifier. So where Sidhekin used a single blankspace, I had to use a '\s'.

This is straightforward way for a programmer to do a greedy capture in the middle of the string. Realize tho that it it not the most straightforward way for the computer. For each 'm \d+' expression in the string, the computer

- starts at the current 'beginning' of the string - matches the 'm \d+' at the current position - matches (fore) the foo and all the 'm \d+' before the current posi +tion - matches (aft) all the remaining 'm \d+' and the final bar - and THROWS AWAY the fore and aft matches (they're non-capturi +ng)
This is a trivial amount of extra work on a single line. But if you are attempting to do something similar by, say, matching across line breaks and pattern searching a set of 120 page MS-Word documents, you may notice some performance problems.

Update: added detail


In reply to Re^2: Arbitrary number of captures in a regular expression by throop
in thread Arbitrary number of captures in a regular expression by grinder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (5)
    As of 2014-09-20 03:18 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (152 votes), past polls