Re: 'g' flag w/'qr'

Replies are listed 'Best First'.
Re^2: 'g' flag w/'qr' by perl-diddler (Chaplain) on May 31, 2016 at 21:25 UTC
I remember the table -- but bdf's comment in Mastering Perl was that 'qr' forms "Regular Expresssion". It's an object of class Regexp and type REGEXP: `perl -we'use strict;use P; use Types::Core qw(typ); my $re = qr{ab}; P "re(=%s), has ref %s, and type %s", $re, ref $re, typ $re; ' re(=(?^:ab)), has ref Regexp, and type REGEXP` [download] 'perlop' says (emphasis mine): `Binary "=~" binds a scalar expression to a pattern match.` [download] Going by: `perl -we'use strict;use P; my $str="part"; my @match_m = $str =~ m{^(.).?(.)}; my @match_qr = $str =~ qr{^(.).?(.)}; sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; + P "res1:%s\nres2:%s", p_Ar \@match_m, p_Ar \@match_qr; ' res1:#=2, content=(['p', 'a']) res2:#=2, content=(['p', 'a'])` [download] It must be the case that both m{} and qr{} are both pattern matches. Isn't a "pattern match" a "regular expression pattern"? The fact that the =~ treats them the same, but the 'g' flag only works on one of them seems *counter-intuitive. I realize that "behind the scenes", documentation says they "don't", but why, might not, the 'g' flag apply to sub-pattern, so, at least, things like "\G" would work inside "qr"? (Note, \G is is defined as a legal zero width assertion that appears "usable" in a "qr" pattern (but I'm not sure to what effect w/o "(?g)). After more than a bit of experimenting, I found '\G' is usable, but a bit awkward to use inside 'qr' op, since it only will work when wrapped with an 'm{}', though even there, for some reason, it returns an extra pair of matches that contain undef: > perl -we'use strict; use P qw(:undef="(undef)"); my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; my $base_pat=q(p--t); sub p_Ar($) { P "#=%d, content=(%s)", scalar(@{$_[0]}), $_[0]}; our (@match_mstr , @match_mqr , @match_qr , @match_qr2 , @tst_names); @tst_names = (qw(mstr mqr qr qr2)); local * p_matches; p_matches = sub ($) { no strict "refs"; $_ = $base_pat x $_[0]; @match_mstr = $_ =~ m{$qr_string}g; @match_mqr = $_ =~ m{$qr}g; @match_qr = $_ =~ $qr; @match_qr2 = $_ =~ qr{$qr}; P qq(For str="%s:\n).(qq(%10s:%s\n) x @tst_names), $_ , (map { ("res_".$_, p_Ar(\@{"match_".$_})) } @tst_names); 0; }; my $c=1; while (3>=$c) { p_matches($c++) } ' For str="p--t: res_mstr:#=4, content=(['p', 't', (undef), (undef)]) res_mqr:#=4, content=(['p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--t: res_mstr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_mqr:#=6, content=(['p', 't', 'p', 't', (undef), (undef)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) For str="p--tp--tp--t: res_mstr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_mqr:#=8, content=(['p', 't', 'p', 't', 'p', 't', (undef), (unde +f)]) res_qr:#=2, content=(['p', 't']) res_qr2:#=2, content=(['p', 't']) [download] ARG!!!... I'm getting a headache. p.s. -- how does one get unicode characters to display in '<code>'? Those ∄'s (∄), above, are ugly (supposed to be symbol for "There does not exist", i.e.: undef. --- which seems to display ok in normal text, but not inside a '<code>' block. sigh* UPDATE: removed/replaced version of code that used the default undef (∄) symbol to use "(undef)" instead.	[reply] [d/l] [select]
Re^3: 'g' flag w/'qr' by ikegami (Patriarch) on Jun 01, 2016 at 00:17 UTC
It must be the case that both m{} and qr{} are both* pattern matches.* No. Read `qr`'s documentation. It does not perform any matching. It simply compiles a pattern. Isn't a "pattern match" a "regular expression pattern"? Matching is the action of checking if something is consistent with a definition. A pattern is a definition of a set of strings. Binary "=~" binds a scalar expression* to a pattern match.* Indeed it does. These matching operators are m//, s/// and tr///. If you don't use one of these explicitly, you are using m// implicitly. `$x =~ m// => $x =~ m// $x =~ s/// => $x =~ s/// $x =~ tr/// => $x =~ tr/// $x =~ EXPR ~> $x =~ do { my $anon = EXPR; m/$anon/ }` [download] This means that all of the following are functionally equivalent: `$x =~ m/abc/ $x =~ qr/abc/ $x =~ q/abc/ $x =~ qq/abc/ $x =~ qx/echo abc/ $x =~ 'abc' $x =~ "abc" $x =~ sub { "abc" }->()` [download] What it doesn't mean is that all operators perform regex pattern matching. `qr/abc/g` makes no more sense than `qq/abc/g`.	[reply] [d/l] [select]
Re^3: 'g' flag w/'qr' by AnomalousMonk (Archbishop) on Jun 01, 2016 at 01:14 UTC
... I found '\G' is usable, but ... it returns an extra pair of matches that contain undef ... This has nothing to do with the `\G` assertion, but is a facet of the way unmatched capture groups behave in list context when allowed to match zero times. Consider: c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["c", "d", "E", "F", undef, undef] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l", undef, undef] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["E", "F", undef, undef] 'g--hI--Jk--l' -> ["k", "l", undef, undef] [download] Both of the variations above, with and without the `\G` assertion `q((?:\G(\w)\W{2}(\w)))` and `q((?:(\w)\W{2}(\w)))` but with a `` quantifier on the `(?:...)` group containing the capture groups, produce pairs of spurious `undef` values, although the other values generated are different. Versions of the regex eliminating the `` quantifier (or using a `+` quantifier, but no example of this is given) do not produce spurious `undef`s: c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"] [download] Don'cha just love regexes? Play with variations of these patterns (including `qr[$qr_string]` and `qr[$qr_string+]`) for deeper confu... um, greater enlightenment. So what's going on? Here's how I would describe it: If the `(?:...(...)...(...))` group containing two capture groups is allowed to match zero* times at some point, e.g., the end of the string, it will! However, the capture groups inside it don't actually capture anything, so they return `undef`. Compare that behavior to unmatched capture groups in an alternation: `c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $s = 'aBcDeFg'; ;; my @captures = $s =~ m{ (B) \| (D) \| (F) }xmsg; dd \@captures; " ["B", undef, undef, undef, "D", undef, undef, undef, "F"]` [download] Also consider: `c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string+]; ;; my $s = '%%%%'; print 'MATCH!!!' if $s =~ /$qr/g; dd \@-; ;; my @captures = $s =~ /$qr/g; dd \@captures; " MATCH!!! [0] [undef, undef, undef, undef, undef, undef, undef, undef]` [download] Update:* In place of the last example, consider instead: c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; my $s = '%%%%'; ;; print 'match @ offset ', $-[0], ' ($1, $2)==', pp $1, $2 while $s = +~ /$qr/g; ;; my @captures = $s =~ /$qr/g; pp \@captures; " match @ offset 0 ($1, $2)==(undef, undef) match @ offset 1 ($1, $2)==(undef, undef) match @ offset 2 ($1, $2)==(undef, undef) match @ offset 3 ($1, $2)==(undef, undef) match @ offset 4 ($1, $2)==(undef, undef) [undef, undef, undef, undef, undef, undef, undef, undef, undef, undef] [download] For discussion of `$-[0]`, please see `@-` in perlvar. Also note that the definition `my $qr = qr[$qr_string];` was changed from the previous example to remove the `+` quantifier, which was included accidentally and only served to obscure the example. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: 'g' flag w/'qr' by LanX (Saint) on May 31, 2016 at 21:53 UTC
> `perlop` says (emphasis mine): > Binary "=~" binds a scalar expression * to a * pattern match * .* `perlop` also says If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. this means there is a DWIM behaviour to fall back to match `m//` and the following are equivalent: `DB<100> "abc" =~ m/a/ => 1 DB<101> "abc" =~ "a" => 1 DB<102> "abc" =~ qr(a) => 1` [download] please note that you could also use a plain string (line 101), but still without /g. ِAGAIN /g transforms `m//` and `s///` to different commands with different contextual behaviour! for instance The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the "pos()" function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]