Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Best Practice: Order of regex modifiers?

by LanX (Saint)
on Feb 01, 2017 at 15:14 UTC ( [id://1180762]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Maybe a non problem

perlre#Modifiers lists a many modifiers for regexes and at the moment I'm updating case insensitivity to a number of regexes which look like

s ~ match ~ replace ~ xegis;

so I was thinking about a "default order" to improve readability.

One aproach could be linguistic (my colleague proposed segxi in this case ;)

Another importance . Something like

  • /g creates a totally new looping command, changing context behaviour
  • /e is security relevant and s/// only
  • /x allows multiline syntax
  • ...
I'm aware that there are approaches to make xms default and facilitate readability.

Suggestions? Ideas? Links?

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Replies are listed 'Best First'.
Re: Best Practice: Order of regex modifiers?
by haukex (Archbishop) on Feb 01, 2017 at 15:28 UTC

    Hi Rolf,

    Most regexes I write only have a few modifiers, so it's hard to get confused no matter what order they are in, and in those cases I don't see order as a problem. Although I don't have a strong opinion on this, if I did have to settle on a standard, I might consider the order that Perl uses when regexes are stringified, e.g.

    $ perl -wMstrict -le 'print qr/abc/msixpoun' (?upmsixn:abc)

    Although unfortunately, it seems this order doesn't match with the documentation, qr/STRING/msixpodualn, which is another possible ordering...

    Update: Also, I often place those modifiers that change the behavior of the regex, like /gc, first, so they're immediately obvious.

    Regards,
    -- Hauke D

      Hi Hauke,

      Thanks, I ignored the "natural order" of perldocs ;-)

      > Update: Also, I often place those modifiers that change the behavior of the regex, like /gc, first, so they're immediately obvious.

      Well all modifiers change the behaviour of a regex, don't you think?

      (I think that's why they are called modifiers ;-)

      This leads to my suggestion to order (or at least group) by importance...

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        Hi Rolf,

        Well all modifiers change the behaviour of a regex, don't you think?

        Well yes, but: some change how the pattern of the regex is treated, like /xmsialud, whereas some change how the regex operator, like m//, behaves. For example, the return values of m// are quite different from those of m//g, and /g doesn't affect how the pattern is treated.

        Regards,
        -- Hauke D

Re: [OT - Separator character]: Best Practice: Order of regex modifiers?
by AnomalousMonk (Archbishop) on Feb 01, 2017 at 17:43 UTC

    I've often thought that a separator character would be useful in regex modifier strings to improve readability. Literal numbers have the  _ (underscore) for this reason, and I don't see why this separator cannot be "overloaded" for use in regexes.

    E.g., rather than
        m{ ... }xmsgco
    or
        s{ ... }{...}xmsgeepo
    one might write
        m{ ... }xms_gc_o
    or
        s{ ... }{...}xms_geep_o
    (just to fabricate some extreme cases). Of course, my own personal practice is always to use an  /xms modifier tail, so a separator would always fall after this mandatory group if there were additional modifiers.


    Give a man a fish:  <%-{-{-{-<

        ... the use of o seems to be discouraged.

        I only latched onto  /o because I was casting about for something to use in a manufactured example.

        AFAIU, the  /o modifier is only useful now in those very limited cases in which one wishes to prevent recompilation of a  qr// m// s/// even when interpolated Regexp objects or strings have changed. My understanding is that these operators will not now recompile on each execution unless an interpolated regex/string has changed.


        Give a man a fish:  <%-{-{-{-<

Re: Best Practice: Order of regex modifiers?
by hippo (Bishop) on Feb 02, 2017 at 09:47 UTC

    Alphabetical. Surely if you are trying to eye-parse some code and want to know if a particular modifier has been applied, this is the clearest and fastest approach to use.

    Perhaps this question could be submitted to the poll ideas quest 2017?

      > Alphabetical. Surely if you are trying to eye-parse some code and want to know if a particular modifier has been applied, this is the clearest and fastest approach to use.

      I disagree.

      First one should separate modifiers which are s/// only from standard m// modifiers (the latter (most?) can also be pre-compiled into the regex using qr// )

      Than ordering by (and/or)

      • category
      • seniority (new vs established)
      • frequency²
      • memorizing
      make sense.

      For instance /a /d /l /u are perlre#Character-set-modifiers ° but are mostly listed as /dual for obvious reasons, the word "dual" is far easy to remember. (I'd even argue that /i belongs to same category but which much higher frequency)

      So I'd say divide and conquer, humans can grasp sets with 5 to 7 elements far more easily, so 5 categories with at most 5 elements should fit

      (... because of connectivity problems the rest of the post got lost :/ ... TL; don't want to rewrite and posting by tethering thru mobile)

      so my bet at the moment is the following order by categories, respecting frequency and memorization

      Categories

      • Syntax x
      • Line m,s
      • Matching n,p
      • Character i,d,u,a,l
      • Operation g,c,(r)
      • Substitution-only r, e,ee, o

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      ° not sure why the deep linking doesn't work (for me) seems like the anchor is missing.

      ² in 5.10 perlre only listed 7 modifiers and already did a categorization: "g and c: Unlike i, m, s and x, these two flags affect the way the regex is used"

Re: Best Practice: Order of regex modifiers?
by kcott (Archbishop) on Feb 02, 2017 at 15:07 UTC

    G'day Rolf,

    Purely for readability, I generally try to keep the modifiers in alphabetical order. In that respect, I concur with ++hippo's response.

    I'm pretty sure that the "xms default" came about, because that was the order they were introduced in the book "Perl Best Practices".

    1. Always use the /x flag. (pp. 236-237)
    2. Always use the /m flag. (pp. 237-239)
    3. Always use the /s flag. (pp. 240-241)

    Some modifiers can be applied to the regex itself (e.g. /x); others, to any operation the regex is involved in (e.g. /g); and others to only a specific operation (e.g. /e). It's a fatal error to use them in the wrong places:

    $ perl -E 'say qr{}x' (?^ux:) $ perl -E 'say qr{}g' Unknown regexp modifier "/g" at -e line 1, near "say " Execution of -e aborted due to compilation errors. $ perl -E 'say m{}g' $ perl -E 'say m{}e' Unknown regexp modifier "/e" at -e line 1, near "say " Execution of -e aborted due to compilation errors. $ perl -E 'say s{}{}e' 1

    I can see some benefit in keeping those grouped together: your initial example of xegis would become isxge.

    I'm also not averse to ++AnomalousMonk's suggestion of using a separator. In which case, xegis would become isx_g_e.

    Overall, I'm not too bothered by personal preferences regarding modifier ordering: deciding upon a single style, and using it consistently, is far more important, in my opinion.

    — Ken

Re: Best Practice: Order of regex modifiers?
by choroba (Cardinal) on Feb 02, 2017 at 16:55 UTC
Re: Best Practice: Order of regex modifiers? ( s///gexis s///mexig )
by Anonymous Monk on Feb 02, 2017 at 02:48 UTC

    Hi,

    I mostly dont think about it too much, but some things are memorable like

    s///gexis

    s///mexig

    s///gimx

    s///gmix

    s///gsix

    s///gesr

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1180762]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-18 11:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found