Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

The 'g' modifier in compiled regex

by naikonta (Curate)
on Apr 09, 2007 at 03:11 UTC ( #608925=perlquestion: print w/replies, xml ) Need Help??

naikonta has asked for the wisdom of the Perl Monks concerning the following question:

I write a compiled regex as following,
$pat = '(' . join('|', @patterns) . ')'; $pat = qr/$pat/i;
I then later add the 'g' modifier as this regex will be used in substituion that is expected to substitute all occurences, as in
$pat = qr/$pat/gi; s/$pat/something/;
But I got a syntax error near "$pat = qr/$pat/gi", so I move the 'g' on the substitution construct.
$pat = qr/$pat/i; s/$pat/something/g;
which is just fine. I wonder if this might be because 'g' is the property of a substitution, not a pattern match. I looked at the perlop and I did see the entry "qr/STRING/imosx", there's no 'g' there. However, this doesn't fit to my current undertanding about the 'g' usage, so I tried the ordinary matching construct,
perl -e 'print if /perl/gi'
It went OK, and this is not supprising me since I use this kind of construct many times before:
$str = 'nothing but perl can parse Perl'; @match = $str =~ /(perl)/gi; print for @match;
I'm aware that qr// is only compilation and m// is compilation and execution at once. But, why 'g' doesn't work in qr//? Does it have anything to do with the way the regex is precompiled?

I thought that when perl precompiled a qr/regex/g, it noticed the g and planned (if this term is correct) to do the multimatching whenever the regex is later used, either in normal matching or substitution. I might overlook any explanation in the perlop and perlre and perlfaq6, or anywhere else about this, so where can I find it? I expected something in the perlop that says something like "you can't use g here because bla bla ....".

What I need is I compile a multimatch regex once and I use it in many substitution constructs without ever worrying about the 'g' modifier anymore.

Replies are listed 'Best First'.
Re: The 'g' modifier in compiled regex
by ikegami (Patriarch) on Apr 09, 2007 at 05:27 UTC

    Basically, it boils down to: g affects the operation (match or substitution), not the regexp. There's going to be problems if you associate it with the regexp. Read on for some problems.

    xism options are toggles. Their absence signify the opposite of their presense. If g were an option, its absence would mean "don't loop".

    Given that g must apply to the entire match or substitute operation. It makes no sense for it to affect only a part of it, so
    $re = qr/.../g; /$re/; and
    $re = qr/.../; /$re/g; would make no sense.
    You can't both loop and not loop.

    $re = qr/.../g; /$re/g; and
    $re = qr/.../; /$re/; would make sense, so
    nothing would be gained.

    There's another subbtle, but ugly problem. Consider

    $str = 'Nothing but perl can parse Perl'; print($str =~ /perl/gi ?1:0,"\n"); # Prints '1' print($str =~ /noth/gi ?1:0,"\n"); # Prints '0' print($str =~ /perl/i ?1:0,"\n"); # Prints '1' print($str =~ /noth/i ?1:0,"\n"); # Prints '1'

    Also consider

    $str = 'Nothing but perl can parse Perl'; while (/perl/gi) { # Loops twice } while (/perl/i) { # Loops forever }

    A match in scalar context using g and one not using it are simply not interchangeable. If g were to be a modifier on compiled regexps, allowing compiled regexps to be used in a match operator in scalar context would severly weaken that code.

    Obviously, disallowing compiled regexp in a match in scalar context is not acceptable. So what's the solution? An operator that compiles a regexp for scalar context and one that compiles a regexp for list context? yuck! It's simpler to let the user handle g.

    my @matches = ($g ? /$re/g : /$re/);


    for (...) { my ($re, $g) = @$_; my @matches = ($g ? /$re/g : /$re/); ... }

    Finally, consider

    >perl -le"print qr/.../s" (?s-xim:...)

    Notice how the s doesn't affect the (?:...), only what's in it? That means $re = qr/.../g; @matches = /$re/; makes no sense since part of the regexp doesn't loop ((?:...)) and part of it does (...). Again, we'd need to do $re = qr/.../g; @matches = /$re/g;, gaining nothing.

      Basically, it boils down to: g affects the operation (match or substitution), not the regexp. There's going to be problem is you associate it with the regexp. Read on for some problems.

      Well, I am aware of this issue. I was wrong about wondering whether /g was for substitution or for matching. I was wrong because /g is for both, for the action (as also pointed out by japhy), but I never thought that /g was for the regex.

      When you said xism are toggles, I looked again at perlre and there under "(?imsx-imsx)" entry it says, "One or more embedded pattern-match modifiers, to be turned on (or turned off, if preceded by "-")". I should have thought that they were toggles in the first place if I read more carefully earlier. Thanks for this :-) But in this case, /g is also a toggle for the normal m// or s///, right?

      Notice how the s doesn't affect the (?:...), only what's in it? That means $re = qr/.../g; @matches = /$re/; makes no sense since part of the regexp doesn't loop ((?:...)) and part of it does (...). Again, we'd need to do $re = qr/.../g; @matches = /$re/g;, gaining nothing.

      Well, that's for the current implementation. But, what if I *know* that I will use the qr/pat/g in list context for the rest of the code? And when I use it in scalar context then it's my fault. If /g is allowed in qr//, it will surely be accompanied by a note something like "only use it in list context, otherwise there will be mandatory warning your code will be ignored", or even, "..., otherwise it's compilation error".

      I don't dive into the source code or the p5p archive. I don't know how exactly the regex precompilation is implemented, or how it will be affected should /g is implemented in the precompiled regex (if it's possible at all). Now, you just tempted me to sense the tendention to request this feature: allow global matching modifier in the qr// so it will always match globally whenever that compiled regex is used. :-)

        Please don't request this feature. One of the big advantages of Perl is that, to a very large extent, it does what most programmers expect, when they think about it a little. As japhy's and ikegami's comments indicate, attaching looping behavior to a pre-compiled regex is difficult to make sense of, and it would prevent the same pre-compiled regex from being used in a looping and non-looping context, as your suggested error messages indicate.

        I'm confident in the Perl architects, and so I doubt they would add this type of feature, but I also think there is significant overhead to considering a suggestion. The way I see it, the more we exercise self-restraint in our suggestions, in response to feedback here in PM, the sooner we'll see a release of Perl 6.

        You wont see /g being support by qr//'s. I think i can say that as a definitive statement. Part of the reason is that with the exception of the /o modifier then only modifiers allowed on a qr// are those that can apply to a subsection of a pattern. You have to ask yourself what would happen in a case like:

        my $qr_g=qr/foo/g; my $other=qr/balh $qr_g/;

        What would it mean to have half of a pattern be /g and the other half not? What sense would it make if somebody used $other specifically without /g?

        Also, the /g modifier is a property of the PMOP that will be executing a pattern like operator, whereas msix are properites of the resulting compiled pattern. In short they arent even stored in the same place.

        Forget the idea of qr//g, its not going to happen.


Re: The 'g' modifier in compiled regex
by japhy (Canon) on Apr 09, 2007 at 03:24 UTC
    The /g modifier is an attribute of the pattern matching action, not the regex itself.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Likewise, the /e modifier is a modifier for the substitution step of s///, which is even one step further away from the regex (qr//) than /g, so that even m//can't use it.

      In Javascript, that borrowed a lot from Perl, the /g is indeed attached to the regex for replace, but IMnsHO, that was a design mistake. It should have been a property, or a separate parameter, of replace.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://608925]
Approved by friedo
Front-paged by friedo
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2022-05-24 06:52 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (82 votes). Check out past polls.