Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

'g' flag w/'qr'

by perl-diddler (Chaplain)
on May 29, 2016 at 02:25 UTC ( #1164425=perlquestion: print w/replies, xml ) Need Help??
perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

I was trying a few examples from the "Mastering Perl" (brian d foy) book relating to RE's. I noted w/interest, the 3 ways he used to match a "qr" expression on page 3.

sub iregex { qr/ ... /ix } 1) my $regex = iregex() if ($isbn =~ m/$regex/ ) { print "Matched!\n } 2) my $regex = iregex() if ($isbn =~ $regex ) { print "Matched!\n } 3) if ($isbn =~ iregex() ) { print "Matched!\n }
I especially liked methods 2 & 3 as they didn't need interpolation, with 3 being better as it didn't require an intermediate variable.

Seeing that I wanted to try out capturing all matching sub expressions using 'g' added to my 'qr' expression, but I'm getting a deprecation error:

> perl -we'use strict;use P; my $re = qr{ (\w+) }gx; my $dat = "Just another cats meow"; my @matches = $dat =~ $re; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);' output: Having no space between pattern and following word is deprecated at -e + line 2. Bareword found where operator expected at -e line 2, near "qr/ (\w+) / +gx" syntax error at -e line 2, near "qr{ (\w+) }gx" Execution of -e aborted due to compilation errors.

If I move the 'g' option down to where the RE is used and use a *second* "RE" operator (i.e.: /.../g) and interpolate my expression into another RE (the slowest of the above 3 options), I get:

> perl -we'use strict;use P; my $re = qr{ (\w+) }x; my $dat = "Just another cats meow"; my @matches = $dat =~ /$re/g; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);' output: #matches=4, matches=["Just", "another", "cats", "meow"]

So why can't I add the global switch to the "qr" expression? FWIW, I also tried qr{ (?xg) }. Perl strips out the 'g' and tells me to add it to the end of the RE -- where it is deprecated:

perl -we'use strict;use P; my $re = qr{ (?xg) (\w+) }; my $dat = "Just another cats meow"; my @matches = $dat =~ $re; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);' output: Useless (?g) - use /g modifier in regex; marked by <-- HERE in m/ (?xg + <-- HERE ) (\w+) / at -e line 2. #matches=1, matches=["another"]

So how can I attach the "/g" modifier to my "qr" regex so I can use the direct match as in #2 or #3 above?

Thanks...

P.S. - I also just noticed that in addition to stripping out the 'g' option, the 'x' option doesn't seem to work in the regex's parens, i.e. - (?x).

Replies are listed 'Best First'.
Re: 'g' flag w/'qr'
by AnomalousMonk (Chancellor) on May 29, 2016 at 02:59 UTC

    The  /g modifier can simply not be used with  qr// to build a regex object. It can only be used with the  m// and  s/// operators; interpolation of a regex object into one of these (or literals) is your only hope. Please see the discussion of  qr// in Regexp Quote-Like Operators in perlop and note the available modifiers. See also the discussions of  m// and  s/// (and their available modifiers) in the same section.

    Update:

    ... I also just noticed that ... the 'x' option doesn't seem to work ...
    It works just fine:
    c:\@Work\Perl>perl -wMstrict -le "my $s = 'xxxfoooooyyy'; ;; $s =~ /(?x) ( f o o + ) /; print qq{'$1'}; " 'fooooo'


    Give a man a fish:  <%-{-{-{-<

      Come now. You can't use the documentation which is written after the code has been written as a reason why they code works (or doesn't work) a certain way. The documentation describes behavior -- it is not a justification as to 'why' the code works that way -- or why the 'g' switch was *deprecated* (meaning, that it used to be legal code).

      It implies that the decision was somewhat arbitrary and that it could still work the old way, except someone chose to add another *wart* to perl and create another exception -- that RE's work a certain way in some places, but another way when described by 'qr'.

      As for 'x' working -- i'd call this broken. Why not use the example I provided that shows the broken behavior rather than coming up with some different example where you could make it work?

      I.e.:

      perl -we'use strict;use P; my $re = qr{ (?x) (\w+) }; my $dat = "Just another cats meow"; my @matches = $dat =~ $re; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);' #matches=1, matches=["another"]

      the above doesn't work. It doesn't return the 1st match from the text line. As opposed to moving the (?x) to the end of the qr statement:

      perl -we'use strict;use P; my $re = qr{ (\w+) }x; my $dat = "Just another cats meow"; my @matches = $dat =~ $re; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);' #matches=1, matches=["Just"]

      In the case where I am not using 'g', One would expect the re to match and return the 1st word in the list. With (?x), it doesn't return the 1st word, but returns the 2nd, vs. with 'x' as a suffix, it behaves as expected and only returns the 1st matching word.

      Then you claim the status quo makes sense, when the perl mastery book clearly shows that 'qr' can stand as an RE by itself without 'm' or 's' -- AND it is more efficient when it is used that way. Except it has been crippled by differences in how 'RE's work in 'qr' vs. ones that are "requoted", re-interpolated, and re-compiled in 'm' & 's'. Why is there a difference when 'qr' can be used "standalone"? "qr" wasn't meant to replace 's', but it was meant to replace or be equivalent to 'm'.

      Did you see the 3 ways Mastering perl uses 'qr'. The first way, where it is interpolated, seems to have little benefit over using q(string). then interpolating the string into the m{} statement. But if you can use 'qr' w/o the m{}, you only do the RE-compile once when you use 'qr' -- which can be used to match things directly in "=~" statements -- just like m{} statements are used.

      So again, I ask 'why' the artificial constraint when, from the deprecation warning, it seems apparent, that it used to work.

        I’m guessing you’re running a fairly old version of Perl? On 5.20.2, I get:

        Unknown regexp modifier "/g" at 1645_SoPW.pl line 18, near "= "

        — which is clear. But on 5.12.3, I get:

        Bareword found where operator expected at 1645_SoPW.pl line 18, near " +qr{ (\w+) }gx" syntax error at 1645_SoPW.pl line 18, near "qr{ (\w+) }gx"

        — which isn’t. In any case, the “deprecated” message you’re seeing is related only indirectly to the presence of a /g modifier on a qr// term. The Perl compiler is simply confused as to what the syntax is supposed to mean, and its (incorrect) guess leads it to find a construct which you never intended and which happens to be deprecated.

        Actually, the qr// syntax was introduced in Perl 5.005,1 which was released in 1998. The first edition of the Camel Book to document qr// was the third edition, published in July, 2000 (4 months after the release of Perl 5.6). The section “Pattern-Matching Operators” in Chapter 5 draws a distinction between those modifiers which apply to a regex (are are therefore applicable to qr//) and those which apply to an operator (and are therefore applicable only to m// and s///).2 The /g modifier falls in the second category. So, AFAICT, putting a /g modifier on a qr// term is not “deprecated,” as it was never allowed in the first place.

        1Update (May 30, 2016): See perl5005delta#New qr// operator.
        2Compare the tables on pages 147 and 150.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        You're switching on ?x too late in the regular expression:

        #!perl -w use strict; use Data::Dumper; my $dat = "Just another cats meow"; sub print_matches { my( $re ) = @_; my @matches = $dat =~ /$re/; print "Using $re\n"; print "#matches=%s, matches=%s", scalar(@matches), Dumper \@matche +s; }; print_matches( qr{ (?x) (\w+) } ); print_matches( qr{(?x) (\w+) } ); __END__ Using (?^: (?x) (\w+) ) #matches=%s, matches=%s1$VAR1 = [ 'another' ]; Using (?^:(?x) (\w+) ) #matches=%s, matches=%s1$VAR1 = [ 'Just' ];

        The first whitespace is not governed by ?x.

        how now brown cow

        did you try splain?

        Having no space between pattern and following word is deprecated at -e + line 2. Having no space between pattern and following word is deprecated at -e + line 2 (#1) (D syntax) You had a word that isn't a regex modifier immediately following a pattern without an intervening space. If you are trying to use the /le flags on a substitution, use /el instead. Otherwise, add white space between the pattern and following word to eliminate the warning. As an example of the latter, the two constructs: $a =~ m/$foo/sand $bar $a =~ m/$foo/s and $bar both currently mean the same thing, but it is planned to disallow the first form in Perl 5.18. And, $a =~ m/$foo/and $bar will be disallowed too.

        Do you get it?

        That is in addition to Bareword found where operator expected

        None of that means qr//g ever worked or was meant to work, the g was always ignored by qr

Re: 'g' flag w/ 'qr'
by Athanasius (Chancellor) on May 29, 2016 at 03:05 UTC

    Hello perl-diddler,

    If you look at the documentation for qr//, you’ll see that the /g modifier is not supported:

    qr/STRING/msixpodualn
    perlop#Regexp-Quote-Like-Operators

    Which makes sense: qr turns STRING into a regular expression, which may then be used in any number of m{...} and s{...}{...} constructs. The appropriate place to add a /g modifier is at the point of use:

    use strict; use warnings; use P; my $re = qr{ (\w+) }x; my $dat = "Just another cats meow"; my @matches = $dat =~ /$re/g; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches);

    Output:

    12:53 >perl 1645_SoPW.pl #matches=4, matches=["Just", "another", "cats", "meow"] 12:54 >

    Update:

    P.S. - I also just noticed that in addition to stripping out the 'g' option, the 'x' option doesn't seem to work in the regex's parens, i.e. - (?x).

    I don’t understand what you’re saying here. Can you give some example code?

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      In regards to your question: " P.S. - I also just noticed that in addition to stripping out the 'g' option, the 'x' option doesn't seem to work in the regex's parens, i.e. - (?x).

      I don’t understand what you’re saying here. Can you give some example code?"

      Please see my response to the 1st response above. Using the original data from the original question, (?x) doesn't parse the same as when it is used as a suffix. It misses the 1st match, which at the least, is counter-intuitive, if not broken, no?

Re: 'g' flag w/'qr'
by kcott (Chancellor) on May 29, 2016 at 09:53 UTC

    G'day perl-diddler,

    "So how can I attach the "/g" modifier to my "qr" regex ... ?"
    1. Short answer: you can't so stop trying.
    2. Longer answer: read on ...

    The 'g' modifier is used by m// and s/// to direct how a regex is to be used (single match, global substitution, etc.); it does not affect the regex itself. qr// has no 'g' modifier. Here's links to all three (note the modifier lists):

    The 'g' modifier is not part of qr//'s syntax and, if used, syntax errors are raised (as expected).

    $ perl -wE 'my $re = qr{A}g' Unknown regexp modifier "/g" at -e line 1, near "= " Execution of -e aborted due to compilation errors.

    You also can't do it with the re pragma's '/flags' mode:

    $ perl -wE 'use re "/g"' Unknown regular expression flag "g" at -e line 1.

    See also:

    On a side note — related to what you're doing but not the current problem at hand — are you familiar with the '(?<flags>:<pattern>)' regex construct described in perlre: Extended Patterns: (?adluimnsx-imnsx:pattern)? This construct, and qr//'s interpolating, allows you to write something like this:

    $ perl -wE 'my ($p, $f) = (A => "x"); my $re = qr{(?$f: $p )}; say $re +' (?^u:(?x: A ))

    Now you control the available flags and don't have to worry about qr//'s modifiers.

    By the way, you can't add a 'g' modifier using this method either.  :-)

    $ perl -wE 'qr{(?g:)}' Useless (?g) - use /g modifier in regex; marked by <-- HERE in m/(?g < +-- HERE :)/ at -e line 1.

    — Ken

      You said " are you familiar with the '(?<flags>:<pattern>)' regex construct described in perlre: Extended Patterns: (?adluimnsx-imnsx:pattern)? "

      Yes, but I seem to be finding "holes" in my memory...i.e. I only remembered the m{ (?i) <pattern> } form... I take it that the ":<pattern>" part allows the flags to apply only to the pattern after the colon and before the end-of-parens...

      I readily admit not to knowing how to use every feature in perl's "RE's"... some of which I might use in some odd case, but many of which I put in my "tmp" memory because they are experimental.

      Too many features have been "experimental" in perl for too long, and it really gets confusing -- since my conception of "experimental" was something that was being introduced but might not be stable yet -- however -- when something that was intro'd as Xperimental, but then was stable for more than 2 major releases, that really doesn't fall into the category of experimental, but more of of "some developer's personal 'pet feature'", that got introduced, but was never removed when the experiment was "over".

      Some of those features were introduced to fill in "holes" in perl (case statement). The "Switch" module that was part of core before 5.8, was deprecated with the introduction of "given/when" and its documentation was changed recommending it's usage. Trouble is, you had a Core module deprecated that in the deprecation notes told you to use "given/when" instead -- when, w/5.18, many years after 5.8, anything that never had the experimental label removed, generate sometimes fatal diagnostics (if you follow advice in most languages to get rid of all warnings, and then make all warnings "fatal").

      I tended to think that advice applied to computer languages and good-programming practices, in general -- but 5.18 made it clear that those rules didn't apply to perl. ;^/ (*sigh*)

        ... Extended Patterns: (?adluimnsx-imnsx:pattern) ...

        This is the very useful and highly non-experimental "non-capturing group" construct. It is most commonly found in its modest  (?:pattern) form, with no modifiers present. If you haven't already, I suggest you seek the company of this potentially stalwart and faithful regex companion.


        Give a man a fish:  <%-{-{-{-<

Re: 'g' flag w/'qr'
by haukex (Abbot) on May 29, 2016 at 11:12 UTC

    Hi perl-diddler,

    So why can't I add the global switch to the "qr" expression?

    Here are some of my thoughts about the "why" part of the question: the way I like to think of this is that /g doesn't modify the regular expression (the inside of the /.../ construct), it modifies how m// and s/// work. A thorough reading of Regexp Quote Like Operators shows that this behavior is pretty complex, and for example the return values of m// depend on context, whether or not the regular expression contains capturing groups, and whether the /g modifier is present. Consider that when you construct a regular expression with qr//, it is not yet known what kind of regular expression operator it will be used in - it can be used in both an m// and an s///. Lastly, consider this hypothetical situation:

    my $foo = qr/foo/g; # hypothetical, won't work my $bar = qr/bar(\d+)/; while (<>) { while (/$foo$bar/) {...} # endless loop or not? my @x = /$foo$bar/; # what's the return value? s/$bar$foo/quz$1/; # replaces first or all matches? # ... etc. ... }

    Hope this helps,
    -- Hauke D

Re: 'g' flag with 'qr' (different commands) (updated)
by LanX (Bishop) on May 29, 2016 at 12:23 UTC
    The short answer for why is that m// is a different command compared to m//g (resp. s/// vs s///g ).

    The /g variants are for repetitive matches / looping !

    It's comparable to switching between if and while , which are obviously different beasts.

    If you look at other languages like eg javascript¹ you'll notice that they are implemented using different keywords.

    ( I wished the perldocs were clearer here because it's causing confusion)

    Furthermore are regular expressions meant to be combinable/nestable , but a sub expression with /g doesn't make any sense.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    ¹) couldn't reproduce this easily, JS is very close to Perl in many aspects,

    I had probably another language in mind... Python, PHP?

      You said: "short answer for why is that m// is a different command compared to m//g".

      I was noting the similarity between these forms:

      my $foo="regexp"; my @matches = $foo =~ /(re)ge(xp)/; #and my $regex=qr{(re)ge(xp)}; my @matches = $foo =~ $regex;
      In both of the above, "@matches" will get 2 entries "re" and "xp". I find it unfortunate that if your regexp was doubled, there is no way to directly use qr w/'g' i.e. one must use the m{} or // type form as in:
      > perl -we'use strict;use P; my $regex=qr{(re)ge(xp)}; my @matches0 = "regexpregexp" =~ /(re)ge(xp)/g; my @matches = "regexpregexp" =~ $regex; my @matches1 = "regexpregexp" =~ /$regex/g; P "matches=%s, 0=%s, 1=%s", \@matches, \@matches0, \@matches1; ' matches=["re", "xp"], 0=["re", "xp", "re", "xp"], 1=["re", "xp", "re", + "xp"]
        I find it unfortunate ...

        Sometimes life's like that.

        So far, you've only advocated the utility and desirability of code like
            my $regex = qr{...}g;
            ...;
            ... $string =~ $regex;
        in which
            ... $string =~ $regex;
        would be equivalent to the current usage
            ... $string =~ /$regex/g;
        given that  qr//g is not now supported.

        I think you need to consider the implications of the cases raised by haukex here. These (and I'm sure many other) cases would need to be given defined behaviors. Surely, you do not intend the  qr//g feature to apply only to the  $string =~ $regex case? If so, this feature would seem to fall squarely into the "some developer's personal 'pet feature'" category.


        Give a man a fish:  <%-{-{-{-<

        > I find it unfortunate that. ..

        I don't.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re: 'g' flag w/'qr'
by ikegami (Pope) on May 30, 2016 at 04:52 UTC

    The g flag tells the match operator and the substitution operator to match repeatedly. It makes no sense to use on other operators that do not performing any matching (such as q, qq, qr, qx and tr).

    Option Pertains to (?:) qr// m s tr
    mmeaning of regular expression patternYesYesYesYes
    smeaning of regular expression patternYesYesYesYes
    imeaning of regular expression patternYesYesYesYes
    xmeaning of regular expression patternYesYesYesYes
    pmeaning of regular expression patternYesYesYesYes
    a/d/l/umeaning of regular expression patternYesYesYesYes
    nmeaning of regular expression patternYesYesYesYes
     
    ocompiling of regular expression patternsYesYesYes
     
    cmatching of regular expression patternsYesYes
    gmatching of regular expression patternsYesYes
     
    ereplacement expressionYes
    eereplacement expressionYes
     
    rinput modificationYesYes
     
    ctransliterationYes
    dtransliterationYes
    stransliterationYes

    The options that pertain to the meaning of regular expression pattern are documented in perlre. The others are documented as part of the documentation of the operators to which they pertain.

      I remember the table -- but bdf's comment in Mastering Perl was that 'qr' forms "Regular Expresssion". It's an object of class Regexp and type REGEXP:
      perl -we'use strict;use P; use Types::Core qw(typ); my $re = qr{ab}; P "re(=%s), has ref %s, and type %s", $re, ref $re, typ $re; ' re(=(?^:ab)), has ref Regexp, and type REGEXP
      'perlop' says (*emphasis mine*):
      Binary "=~" binds a *scalar expression* to a *pattern match*.
      Going by:
      perl -we'use strict;use P; my $str="part"; my @match_m = $str =~ m{^(.).*?(.)}; my @match_qr = $str =~ qr{^(.).*?(.)}; sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; + P "res1:%s\nres2:%s", p_Ar \@match_m, p_Ar \@match_qr; ' res1:#=2, content=(['p', 'a']) res2:#=2, content=(['p', 'a'])
      It must be the case that both m{} and qr{} are *both* pattern matches. Isn't a "pattern match" a "regular expression pattern"?

      The fact that the =~ treats them the same, but the 'g' flag only works on one of them seems counter-intuitive. I realize that "behind the scenes", documentation says they "don't", but why, might not, the 'g' flag apply to sub-pattern, so, at least, things like "\G" would work inside "qr"? (Note, \G is is defined as a legal zero width assertion that appears "usable" in a "qr" pattern (but I'm not sure to what effect w/o "(?g)).

      After more than a bit of experimenting, I found '\G' is usable, but a bit awkward to use inside 'qr' op, since it only will work when wrapped with an 'm{}', though even there, for some reason, it returns an extra pair of matches that contain undef:

      > perl -we'use strict; use P qw(:undef="(undef)"); my $qr_string = q((?:\G(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; my $base_pat=q(p--t); sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; our (@match_mstr , @match_mqr , @match_qr , @match_qr2 , @tst_names); @tst_names = (qw(mstr mqr qr qr2)); local * p_matches; *p_matches = sub ($) { no strict "refs"; $_ = $base_pat x $_[0]; @match_mstr = $_ =~ m{$qr_string}g; @match_mqr = $_ =~ m{$qr}g; @match_qr = $_ =~ $qr; @match_qr2 = $_ =~ qr{$qr}; P qq(For str="%s:\n).(qq(%10s:%s\n) x @tst_names), $_ , (map { ("res_".$_, p_Ar(\@{"match_".$_})) } @tst_names); 0; }; my $c=1; while (3>=$c) { p_matches($c++) } ' For str="p--t: res_mstr:#=4, content=(['p', 't', (undef), (undef)]) res_mqr:#=4, content=(['p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--t: res_mstr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_mqr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--tp--t: res_mstr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_mqr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't'])

      ARG!!!... I'm getting a headache.

      p.s. -- how does one get unicode characters to display in '<code>'?
      Those &#8708;'s (∄), above, are ugly (supposed to be symbol for "There does not exist", i.e.: undef. --- which seems to display ok in normal text, but not inside a '<code>' block. *sigh*

      UPDATE: removed/replaced version of code that used the default undef (∄) symbol to use "(undef)" instead.

        It must be the case that both m{} and qr{} are *both* pattern matches.

        No. Read qr's documentation. It does not perform any matching. It simply compiles a pattern.

        Isn't a "pattern match" a "regular expression pattern"?

        Matching is the action of checking if something is consistent with a definition.

        A pattern is a definition of a set of strings.

        Binary "=~" binds a *scalar expression* to a *pattern match*.

        Indeed it does. These matching operators are m//, s/// and tr///. If you don't use one of these explicitly, you are using m// implicitly.

        $x =~ m// => $x =~ m// $x =~ s/// => $x =~ s/// $x =~ tr/// => $x =~ tr/// $x =~ EXPR ~> $x =~ do { my $anon = EXPR; m/$anon/ }

        This means that all of the following are functionally equivalent:

        $x =~ m/abc/ $x =~ qr/abc/ $x =~ q/abc/ $x =~ qq/abc/ $x =~ qx/echo abc/ $x =~ 'abc' $x =~ "abc" $x =~ sub { "abc" }->()

        What it doesn't mean is that all operators perform regex pattern matching. qr/abc/g makes no more sense than qq/abc/g.

        ... I found '\G' is usable, but ... it returns an extra pair of matches that contain undef ...

        This has nothing to do with the  \G assertion, but is a facet of the way unmatched capture groups behave in list context when allowed to match zero times. Consider:

        c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["c", "d", "E", "F", undef, undef] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l", undef, undef] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["E", "F", undef, undef] 'g--hI--Jk--l' -> ["k", "l", undef, undef]
        Both of the variations above, with and without the  \G assertion
            q((?:\G(\w)\W{2}(\w))*)
        and
            q((?:(\w)\W{2}(\w))*)
        but with a  * quantifier on the  (?:...) group containing the capture groups, produce pairs of spurious undef values, although the other values generated are different. Versions of the regex eliminating the  * quantifier (or using a  + quantifier, but no example of this is given) do not produce spurious undefs:
        c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"]
        Don'cha just love regexes? Play with variations of these patterns (including  qr[$qr_string*] and  qr[$qr_string+]) for deeper confu... um, greater enlightenment.

        So what's going on? Here's how I would describe it: If the  (?:...(...)...(...)) group containing two capture groups is allowed to match zero times at some point, e.g., the end of the string, it will! However, the capture groups inside it don't actually capture anything, so they return undef.

        Compare that behavior to unmatched capture groups in an alternation:

        c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $s = 'aBcDeFg'; ;; my @captures = $s =~ m{ (B) | (D) | (F) }xmsg; dd \@captures; " ["B", undef, undef, undef, "D", undef, undef, undef, "F"]

        Also consider:

        c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string+]; ;; my $s = '%%%%'; print 'MATCH!!!' if $s =~ /$qr/g; dd \@-; ;; my @captures = $s =~ /$qr/g; dd \@captures; " MATCH!!! [0] [undef, undef, undef, undef, undef, undef, undef, undef]

        Update: In place of the last example, consider instead:

        c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; my $s = '%%%%'; ;; print 'match @ offset ', $-[0], ' ($1, $2)==', pp $1, $2 while $s = +~ /$qr/g; ;; my @captures = $s =~ /$qr/g; pp \@captures; " match @ offset 0 ($1, $2)==(undef, undef) match @ offset 1 ($1, $2)==(undef, undef) match @ offset 2 ($1, $2)==(undef, undef) match @ offset 3 ($1, $2)==(undef, undef) match @ offset 4 ($1, $2)==(undef, undef) [undef, undef, undef, undef, undef, undef, undef, undef, undef, undef]
        For discussion of $-[0], please see @- in perlvar. Also note that the definition
            my $qr = qr[$qr_string];
        was changed from the previous example to remove the  + quantifier, which was included accidentally and only served to obscure the example.


        Give a man a fish:  <%-{-{-{-<

        > perlop says (*emphasis mine*):

        > Binary "=~" binds a * scalar expression * to a * pattern match * .

        perlop also says

        If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time.

        this means there is a DWIM behaviour to fall back to match m// and the following are equivalent:

        DB<100> "abc" =~ m/a/ => 1 DB<101> "abc" =~ "a" => 1 DB<102> "abc" =~ qr(a) => 1

        please note that you could also use a plain string (line 101), but still without /g.

        ِAGAIN /g transforms m// and s/// to different commands with different contextual behaviour!

        for instance

        The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

        In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the "pos()" function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re: 'g' flag w/'qr'
by Don Coyote (Pilgrim) on May 29, 2016 at 11:34 UTC

    Hello perl-diddler

    Keep reading...

    Global Matching Anchors section, beginning on page 5, lighteth a way for Seekers of Perl Wisdom.


    Ahh... the catacombs
Re: 'g' flag w/'qr'
by Laurent_R (Canon) on May 29, 2016 at 21:59 UTC
    What several monks have been trying to explain might be clearer in the Perl 6 documentation on regexes and adverbs (https://doc.perl6.org/language/regexes#Adverbs). Perl 6 adverbs, in regex context, are essentially the equivalent of Perl 5 regex modifiers. This documentation says:

    Adverbs modify how regexes work and give very convenient shortcuts for certain kinds of recurring tasks.

    There are two kinds of adverbs: regex adverbs apply at the point where a regex is defined and matching adverbs apply at the point that a regex matches against a string.

    (...)

    Adverbs that appear at the time of a regex declaration are part of the actual regex and influence how the Perl 6 compiler translates the regex into binary code.

    (...)

    In contrast to regex adverbs, which are tied to the declaration of a regex, matching adverbs only make sense while matching a string against a regex.

    OK, I know you are talking of Perl 5, but I think that these explanations clarify the distinction between modifiers which apply to a regex definition and modifiers which apply to the matching process.

    If you look at the two lists of adverbs, you will see that the equivalent of the g modifier would fit into the matching adverbs category, the equivalent of the :i adverb will be a regex modifier, and :sigspace, the symmetrical counterpart of the x modifier, is a matching adverb.

    I hope this makes sense.

    Update; s/^What several months have/What several monks have/. Thanks to AnomalousMonk and LanX for pointing out the typo.

      Makes sense, but given that they are usually mixed together at the end of an /RE/<options>, it's not readily apparent that they are different. Too bad perl5 couldn't have directly grown into perl6 (somehow?!)...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1164425]
Approved by BrowserUk
Front-paged by kcott
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2018-07-18 04:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (383 votes). Check out past polls.

    Notices?