Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: 'g' flag w/'qr'

by ikegami (Patriarch)
on May 30, 2016 at 04:52 UTC ( [id://1164481]=note: print w/replies, xml ) Need Help??


in reply to 'g' flag w/'qr'

The g flag tells the match operator and the substitution operator to match repeatedly. It makes no sense to use on other operators that do not performing any matching (such as q, qq, qr, qx and tr).

Option Pertains to (?:) qr// m s tr
mmeaning of regular expression patternYesYesYesYes
smeaning of regular expression patternYesYesYesYes
imeaning of regular expression patternYesYesYesYes
xmeaning of regular expression patternYesYesYesYes
pmeaning of regular expression patternYesYesYesYes
a/d/l/umeaning of regular expression patternYesYesYesYes
nmeaning of regular expression patternYesYesYesYes
 
ocompiling of regular expression patternsYesYesYes
 
cmatching of regular expression patternsYesYes
gmatching of regular expression patternsYesYes
 
ereplacement expressionYes
eereplacement expressionYes
 
rinput modificationYesYes
 
ctransliterationYes
dtransliterationYes
stransliterationYes

The options that pertain to the meaning of regular expression pattern are documented in perlre. The others are documented as part of the documentation of the operators to which they pertain.

Replies are listed 'Best First'.
Re^2: 'g' flag w/'qr'
by perl-diddler (Chaplain) on May 31, 2016 at 21:25 UTC
    I remember the table -- but bdf's comment in Mastering Perl was that 'qr' forms "Regular Expresssion". It's an object of class Regexp and type REGEXP:
    perl -we'use strict;use P; use Types::Core qw(typ); my $re = qr{ab}; P "re(=%s), has ref %s, and type %s", $re, ref $re, typ $re; ' re(=(?^:ab)), has ref Regexp, and type REGEXP
    'perlop' says (*emphasis mine*):
    Binary "=~" binds a *scalar expression* to a *pattern match*.
    Going by:
    perl -we'use strict;use P; my $str="part"; my @match_m = $str =~ m{^(.).*?(.)}; my @match_qr = $str =~ qr{^(.).*?(.)}; sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; + P "res1:%s\nres2:%s", p_Ar \@match_m, p_Ar \@match_qr; ' res1:#=2, content=(['p', 'a']) res2:#=2, content=(['p', 'a'])
    It must be the case that both m{} and qr{} are *both* pattern matches. Isn't a "pattern match" a "regular expression pattern"?

    The fact that the =~ treats them the same, but the 'g' flag only works on one of them seems counter-intuitive. I realize that "behind the scenes", documentation says they "don't", but why, might not, the 'g' flag apply to sub-pattern, so, at least, things like "\G" would work inside "qr"? (Note, \G is is defined as a legal zero width assertion that appears "usable" in a "qr" pattern (but I'm not sure to what effect w/o "(?g)).

    After more than a bit of experimenting, I found '\G' is usable, but a bit awkward to use inside 'qr' op, since it only will work when wrapped with an 'm{}', though even there, for some reason, it returns an extra pair of matches that contain undef:

    > perl -we'use strict; use P qw(:undef="(undef)"); my $qr_string = q((?:\G(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; my $base_pat=q(p--t); sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; our (@match_mstr , @match_mqr , @match_qr , @match_qr2 , @tst_names); @tst_names = (qw(mstr mqr qr qr2)); local * p_matches; *p_matches = sub ($) { no strict "refs"; $_ = $base_pat x $_[0]; @match_mstr = $_ =~ m{$qr_string}g; @match_mqr = $_ =~ m{$qr}g; @match_qr = $_ =~ $qr; @match_qr2 = $_ =~ qr{$qr}; P qq(For str="%s:\n).(qq(%10s:%s\n) x @tst_names), $_ , (map { ("res_".$_, p_Ar(\@{"match_".$_})) } @tst_names); 0; }; my $c=1; while (3>=$c) { p_matches($c++) } ' For str="p--t: res_mstr:#=4, content=(['p', 't', (undef), (undef)]) res_mqr:#=4, content=(['p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--t: res_mstr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_mqr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--tp--t: res_mstr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_mqr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't'])

    ARG!!!... I'm getting a headache.

    p.s. -- how does one get unicode characters to display in '<code>'?
    Those &#8708;'s (∄), above, are ugly (supposed to be symbol for "There does not exist", i.e.: undef. --- which seems to display ok in normal text, but not inside a '<code>' block. *sigh*

    UPDATE: removed/replaced version of code that used the default undef (∄) symbol to use "(undef)" instead.

      It must be the case that both m{} and qr{} are *both* pattern matches.

      No. Read qr's documentation. It does not perform any matching. It simply compiles a pattern.

      Isn't a "pattern match" a "regular expression pattern"?

      Matching is the action of checking if something is consistent with a definition.

      A pattern is a definition of a set of strings.

      Binary "=~" binds a *scalar expression* to a *pattern match*.

      Indeed it does. These matching operators are m//, s/// and tr///. If you don't use one of these explicitly, you are using m// implicitly.

      $x =~ m// => $x =~ m// $x =~ s/// => $x =~ s/// $x =~ tr/// => $x =~ tr/// $x =~ EXPR ~> $x =~ do { my $anon = EXPR; m/$anon/ }

      This means that all of the following are functionally equivalent:

      $x =~ m/abc/ $x =~ qr/abc/ $x =~ q/abc/ $x =~ qq/abc/ $x =~ qx/echo abc/ $x =~ 'abc' $x =~ "abc" $x =~ sub { "abc" }->()

      What it doesn't mean is that all operators perform regex pattern matching. qr/abc/g makes no more sense than qq/abc/g.

      ... I found '\G' is usable, but ... it returns an extra pair of matches that contain undef ...

      This has nothing to do with the  \G assertion, but is a facet of the way unmatched capture groups behave in list context when allowed to match zero times. Consider:

      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["c", "d", "E", "F", undef, undef] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l", undef, undef] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["E", "F", undef, undef] 'g--hI--Jk--l' -> ["k", "l", undef, undef]
      Both of the variations above, with and without the  \G assertion
          q((?:\G(\w)\W{2}(\w))*)
      and
          q((?:(\w)\W{2}(\w))*)
      but with a  * quantifier on the  (?:...) group containing the capture groups, produce pairs of spurious undef values, although the other values generated are different. Versions of the regex eliminating the  * quantifier (or using a  + quantifier, but no example of this is given) do not produce spurious undefs:
      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"]
      Don'cha just love regexes? Play with variations of these patterns (including  qr[$qr_string*] and  qr[$qr_string+]) for deeper confu... um, greater enlightenment.

      So what's going on? Here's how I would describe it: If the  (?:...(...)...(...)) group containing two capture groups is allowed to match zero times at some point, e.g., the end of the string, it will! However, the capture groups inside it don't actually capture anything, so they return undef.

      Compare that behavior to unmatched capture groups in an alternation:

      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $s = 'aBcDeFg'; ;; my @captures = $s =~ m{ (B) | (D) | (F) }xmsg; dd \@captures; " ["B", undef, undef, undef, "D", undef, undef, undef, "F"]

      Also consider:

      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string+]; ;; my $s = '%%%%'; print 'MATCH!!!' if $s =~ /$qr/g; dd \@-; ;; my @captures = $s =~ /$qr/g; dd \@captures; " MATCH!!! [0] [undef, undef, undef, undef, undef, undef, undef, undef]

      Update: In place of the last example, consider instead:

      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; my $s = '%%%%'; ;; print 'match @ offset ', $-[0], ' ($1, $2)==', pp $1, $2 while $s = +~ /$qr/g; ;; my @captures = $s =~ /$qr/g; pp \@captures; " match @ offset 0 ($1, $2)==(undef, undef) match @ offset 1 ($1, $2)==(undef, undef) match @ offset 2 ($1, $2)==(undef, undef) match @ offset 3 ($1, $2)==(undef, undef) match @ offset 4 ($1, $2)==(undef, undef) [undef, undef, undef, undef, undef, undef, undef, undef, undef, undef]
      For discussion of $-[0], please see @- in perlvar. Also note that the definition
          my $qr = qr[$qr_string];
      was changed from the previous example to remove the  + quantifier, which was included accidentally and only served to obscure the example.


      Give a man a fish:  <%-{-{-{-<

      > perlop says (*emphasis mine*):

      > Binary "=~" binds a * scalar expression * to a * pattern match * .

      perlop also says

      If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time.

      this means there is a DWIM behaviour to fall back to match m// and the following are equivalent:

      DB<100> "abc" =~ m/a/ => 1 DB<101> "abc" =~ "a" => 1 DB<102> "abc" =~ qr(a) => 1

      please note that you could also use a plain string (line 101), but still without /g.

      ِAGAIN /g transforms m// and s/// to different commands with different contextual behaviour!

      for instance

      The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

      In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the "pos()" function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1164481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-26 08:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found