in reply to Re^3: /g option not making s// find all matches (updated)
in thread /g option not making s// find all matches

The following all work identically:
\G (?! \A) \G (?<! \A) (?! \A) \G (?<! \A) \G
That's brilliant and kind of hurts my brain... :-)
In line with TheDamian's regex Perl Best Practices, I always use an /xms tail on every qr// m// s/// expression I write.
Thanks for the pointer to that. I'll look more into it. In general, I like to use default behaviors unless I need to do something that the default can't accomplish. Then the mechanism to override the default (appending /xms in this case) becomes part of the code's self-documentation, alerting the reader that something outside the norm is happening. (That philosophy fails if some program or interface's defaults are insane, but in my experience, perl's are pretty solid.)

Also, from a readability standpoint, if you have ten regexes, nine of which end in / and the tenth ending in /m, it's easy to see at a glance that the tenth is doing something outside the default. But if you define your default to be /xms, in code with nine regexes ending in /xms and a tenth ending in /xs, the reader is much more likely to overlook the fact that the tenth instance is overriding the local default.

But, again, I say all this without having digested the rationale for TheDamian's recommendations, so it's all FWIW.

Replies are listed 'Best First'.
Re^5: /g option not making s// find all matches (updated)
by hippo (Chancellor) on May 31, 2018 at 08:55 UTC
    I like to use default behaviors unless I need to do something that the default can't accomplish.

    IMHO this is a good philosophy and I wholeheartedly condone it.

    Caveats about house coding rules aside, when working on previously unseen code, if I come across a regex with say /xms and there's no whitespace or no dots or no anchors in it that can cause confusion. What has happened to it? Did it have such things previously and they were edited out? Does the coder not know or understand what the modifiers mean? Is there something in the regex which might change at runtime to make the modifiers useful?

    Having read and understood TheDamian's rationale for this I respectfully disagree with it. The beauty of TIMTOWTDI is that everyone can code in the way that they think best. So let's embrace the diversity where it exists for such good reasons.

Re^5: /g option not making s// find all matches (updated)
by AnomalousMonk (Bishop) on May 31, 2018 at 14:10 UTC
    ... if you define your default to be /xms, in code with nine regexes ending in /xms and a tenth ending in /xs ...

    But that's the point: you never use anything other than an /xms tail in your own code. If you're dealing with someone else's code, you're on your own, and you may have much bigger problems than just regexes to contend with; that's the way of the world.


    Give a man a fish:  <%-{-{-{-<

      But that's the point: you never use anything other than an /xms tail in your own code.
      I dunno, that seems unnecessarily rigid. A "best practice" should mean "do this unless there's a good reason to do otherwise," not "always blindly do this no matter what." All three of those options modify the regex behavior. What if you need the unmodified behavior?

      I realize "need" might be too strong a word; with /x, for instance, you can always just escape any literal space characters your regex needs. But if there are several of them, and your regex is otherwise simple, all those escapes clutter the code more than just omitting the /x. And someone else encountering m/a\ b\ c/xms in your code will wonder what tricky thing you're trying to do by telling the regex engine to ignore whitespace and then escaping all your whitespace.

      I'm not saying your system is wrong — I'm certain it serves you well — but I don't think I'm sold on it yet.
        What if you need the unmodified behavior?

        But what unmodified behavior would you need that you would not be able to access in a fairly clear and simple manner? E.g., if you're using /s and you want the "match anything except a newline" behavior, does not  [^\n] best express this? (One of the driving motivations of TheDamian's regex PBPs is clarity of expression.)

        Conversely, in the absence of /s, how would you best express "match anything at all, including a newline" (the most common use-case, IMO)? Something like  [\s\S] would do the trick, but is that anyone's idea of clarity? (And don't get me started on (?s:.) ...)

        ... with /x ... you can always just escape any literal space characters your regex needs. But ... all those escapes clutter the code ...

        I must admit that the need for special handling of spaces can be annoying. But even this can advance clarity: [ ] (I would not use \) matches a blank space exactly; [ \t] matches a blank space or a tab exactly; \s matches any whitespace exactly; etc... (You may object that \s is too general, but there've been several occasions on which I've been saved from a 3 AM phone call by having used \s rather than a more specific [ ] — or even \) And \Q...\E can help out, too.

        Anyway, TheDamian explains it all more completely and clearly than I can, so maybe pursue that avenue.


        Give a man a fish:  <%-{-{-{-<

Re^5: /g option not making s// find all matches (updated)
by Your Mother (Archbishop) on May 31, 2018 at 09:06 UTC

    AnomalousMonk is probably a better Perl hacker than I am and TheDamian most certainly is but I side with hippo here; hey look another hacker who is better than I. :P I take the regex/substitution flags to be meaningful in the context of the code. If they are not, it might be confusing or waste my time trying to confirm why they are not.

Re^5: /g option not making s// find all matches (updated)
by AnomalousMonk (Bishop) on Jun 01, 2018 at 17:31 UTC
    \G (?<! \A)   (?! \A) \G ... kind of hurts my brain ...

    I got to wondering about all that and thought I might try to clarify it a bit, if only for my own benefit. Say we have the problem "match (and capture) the first  \w character that is not at the start of the string and that is also on a  \b boundary." From the foregoing discussion,  m{ (?! \A) \b (\w) }xms does the trick:

    c:\@Work\Perl\monks>perl -wMstrict -le "print qq{'$1'} if 'ab-cd' =~ m{ (?! \A) (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ \b (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ (?! \A) \b (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ \b (?! \A) (\w) }xms; " 'b' 'a' 'c' 'c'
    Leaving out either zero-width assertion makes the match fail | incorrect. The order of the two assertions doesn't matter because it's a logical conjunction, and if there are no side-effects (and there aren't: we're just examining match position and not matching and consumng any characters, i.e., changing the match position), then A and B and B and A are equivalent expressions.

    So what about the  (?! \A) versus  (?<! \A) look-ahead/behind business? Here's how I think of it: If you're at the North Pole, in which direction do you have to go to get to the North Pole? The question is moot: You can go exactly zero meters in any direction because you're at the North Pole! Similarly, if your match position is at the start of a string, in which direction do you have to "look" to "see" the | that you are at the start of the string? All you have to do is examine the match position; "direction" is meaningless. For the  \A zero-width assertion,  \A  (?= \A)  (?<= \A) are all exactly equivalent. The same reasoning applies to negated assertions:  (?! \A)  (?<! \A) are equivalent. Indeed, I think the same reasoning applies to all zero-width assertions. Here's a Test::More demo to bolster your confidence:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; my @regexes = ( 'negative look-ahead to \A', qr{ (?! \A) \b (\w) }xms, qr{ \b (?! \A) (\w) }xms, qr{ (?! \A) (?! \B) (\w) }xms, qr{ (?! \B) (?! \A) (\w) }xms, qr{ (?! \A) (?<! \B) (\w) }xms, qr{ (?<! \B) (?! \A) (\w) }xms, 'negative look-behind to \A', qr{ (?<! \A) \b (\w) }xms, qr{ \b (?<! \A) (\w) }xms, qr{ (?<! \A) (?! \B) (\w) }xms, qr{ (?! \B) (?<! \A) (\w) }xms, qr{ (?<! \A) (?<! \B) (\w) }xms, qr{ (?<! \B) (?<! \A) (\w) }xms, 'all together now', qr{ \b (?! \A) (?! \B) (?<! \A) (?<! \B) (\w) }xms, ); ;; REGEX: for my $rx (@regexes) { if (ref $rx ne 'Regexp') { note $rx; next REGEX; } 'ab-cd' =~ $rx; ok $1 eq 'c', qq{$rx works}; } ;; done_testing; " # negative look-ahead to \A ok 1 - (?msx-i: (?! \A) \b (\w) ) works ok 2 - (?msx-i: \b (?! \A) (\w) ) works ok 3 - (?msx-i: (?! \A) (?! \B) (\w) ) works ok 4 - (?msx-i: (?! \B) (?! \A) (\w) ) works ok 5 - (?msx-i: (?! \A) (?<! \B) (\w) ) works ok 6 - (?msx-i: (?<! \B) (?! \A) (\w) ) works # negative look-behind to \A ok 7 - (?msx-i: (?<! \A) \b (\w) ) works ok 8 - (?msx-i: \b (?<! \A) (\w) ) works ok 9 - (?msx-i: (?<! \A) (?! \B) (\w) ) works ok 10 - (?msx-i: (?! \B) (?<! \A) (\w) ) works ok 11 - (?msx-i: (?<! \A) (?<! \B) (\w) ) works ok 12 - (?msx-i: (?<! \B) (?<! \A) (\w) ) works # all together now ok 13 - (?msx-i: \b (?! \A) (?! \B) (?<! \A) (?<! \B) (\w) ) works 1..13 ok 14 - no warnings 1..14


    Give a man a fish:  <%-{-{-{-<

Re^5: [DUP, Please REAP] /g option not making s// find all matches (updated)
by AnomalousMonk (Bishop) on Jun 01, 2018 at 17:16 UTC

    DUP of Re^5: /g option not making s// find all matches (updated): Please REAP.

    \G (?<! \A)   (?! \A) \G ... kind of hurts my brain ...

    I got to wondering about all that and thought I might try to clarify it a bit, if only for my own benefit. Say we have the problem "match (and capture) the first  \w character that is not at the start of the string that is also on a  \b boundary." From the foregoing discussion,  m{ (?! \A) \b (\w) }xms does the trick:

    c:\@Work\Perl\monks>perl -wMstrict -le "print qq{'$1'} if 'ab-cd' =~ m{ (?! \A) (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ \b (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ (?! \A) \b (\w) }xms; print qq{'$1'} if 'ab-cd' =~ m{ \b (?! \A) (\w) }xms; " 'b' 'a' 'c' 'c'
    Leaving out either zero-width assertion makes the match fail. The order of the two assertions doesn't matter because it's a logical conjunction, and if there are no side-effects (and there aren't: we're just examining match position and not matching and consumng any characters, i.e., changing the match position), then A and B and B and A are equivalent.

    So what about the  (?! \A) versus  (?<! \A) look-ahead/behind business. Here's how I think of it: If you're at the North Pole, in which direction do you have to go to get to the North Pole? The question is moot: You can go exactly zero meters in any direction because you're at the North Pole! Similarly, if your match position is at the start of a string, in which direction do you have to "look" to "see" the start of the string? For the  \A zero-width assertion,  \A  (?= \A)  (?<= \A) are all exactly equivalent. The same reasoning applies to negated assertions:  (?! \A)  (?<! \A) are equivalent. Indeed, I think the same reasoning applies to all zero-width assertions. Here's a Test::More demo to bolster your confidence (as it did mine):

    c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; my @regexes = ( 'negative look-ahead to \A', qr{ (?! \A) \b (\w) }xms, qr{ \b (?! \A) (\w) }xms, qr{ (?! \A) (?! \B) (\w) }xms, qr{ (?! \B) (?! \A) (\w) }xms, qr{ (?! \A) (?<! \B) (\w) }xms, qr{ (?<! \B) (?! \A) (\w) }xms, 'negative look-behind to \A', qr{ (?<! \A) \b (\w) }xms, qr{ \b (?<! \A) (\w) }xms, qr{ (?<! \A) (?! \B) (\w) }xms, qr{ (?! \B) (?<! \A) (\w) }xms, qr{ (?<! \A) (?<! \B) (\w) }xms, qr{ (?<! \B) (?<! \A) (\w) }xms, 'all together now', qr{ \b (?! \A) (?! \B) (?<! \A) (?<! \B) (\w) }xms, ); ;; REGEX: for my $rx (@regexes) { if (ref $rx ne 'Regexp') { note $rx; next REGEX; } 'ab-cd' =~ $rx; ok $1 eq 'c', qq{$rx works}; } ;; done_testing; " # negative look-ahead to \A ok 1 - (?msx-i: (?! \A) \b (\w) ) works ok 2 - (?msx-i: \b (?! \A) (\w) ) works ok 3 - (?msx-i: (?! \A) (?! \B) (\w) ) works ok 4 - (?msx-i: (?! \B) (?! \A) (\w) ) works ok 5 - (?msx-i: (?! \A) (?<! \B) (\w) ) works ok 6 - (?msx-i: (?<! \B) (?! \A) (\w) ) works # negative look-behind to \A ok 7 - (?msx-i: (?<! \A) \b (\w) ) works ok 8 - (?msx-i: \b (?<! \A) (\w) ) works ok 9 - (?msx-i: (?<! \A) (?! \B) (\w) ) works ok 10 - (?msx-i: (?! \B) (?<! \A) (\w) ) works ok 11 - (?msx-i: (?<! \A) (?<! \B) (\w) ) works ok 12 - (?msx-i: (?<! \B) (?<! \A) (\w) ) works # all together now ok 13 - (?msx-i: \b (?! \A) (?! \B) (?<! \A) (?<! \B) (\w) ) works 1..13 ok 14 - no warnings 1..14


    Give a man a fish:  <%-{-{-{-<