LanX has asked for the wisdom of the Perl Monks concerning the following question:
Maybe a non problem
perlre#Modifiers lists a many modifiers for regexes and at the moment I'm updating case insensitivity to a number of regexes which look like
s ~ match
~ replace
~ xegis;
so I was thinking about a "default order" to improve readability.
One aproach could be linguistic (my colleague proposed segxi in this case ;)
Another importance . Something like
- /g creates a totally new looping command, changing context behaviour
- /e is security relevant and s/// only
- /x allows multiline syntax
- ...
I'm aware that there are approaches to make xms default and facilitate readability.
Suggestions? Ideas? Links?
Re: Best Practice: Order of regex modifiers?
by haukex (Archbishop) on Feb 01, 2017 at 15:28 UTC
|
Hi Rolf,
Most regexes I write only have a few modifiers, so it's hard to get confused no matter what order they are in, and in those cases I don't see order as a problem. Although I don't have a strong opinion on this, if I did have to settle on a standard, I might consider the order that Perl uses when regexes are stringified, e.g.
$ perl -wMstrict -le 'print qr/abc/msixpoun'
(?upmsixn:abc)
Although unfortunately, it seems this order doesn't match with the documentation, qr/STRING/msixpodualn, which is another possible ordering...
Update: Also, I often place those modifiers that change the behavior of the regex, like /gc, first, so they're immediately obvious.
Regards, -- Hauke D | [reply] [d/l] [select] |
|
Hi Hauke,
Thanks, I ignored the "natural order" of perldocs ;-)
> Update: Also, I often place those modifiers that change the behavior of the regex, like /gc, first, so they're immediately obvious.
Well all modifiers change the behaviour of a regex, don't you think?
(I think that's why they are called modifiers ;-)
This leads to my suggestion to order (or at least group) by importance...
| [reply] |
|
Hi Rolf,
Well all modifiers change the behaviour of a regex, don't you think?
Well yes, but: some change how the pattern of the regex is treated, like /xmsialud, whereas some change how the regex operator, like m//, behaves. For example, the return values of m// are quite different from those of m//g, and /g doesn't affect how the pattern is treated.
Regards, -- Hauke D
| [reply] [d/l] [select] |
|
Re: [OT - Separator character]: Best Practice: Order of regex modifiers?
by AnomalousMonk (Archbishop) on Feb 01, 2017 at 17:43 UTC
|
I've often thought that a separator character would be useful in regex modifier strings to improve readability. Literal numbers have the _ (underscore) for this reason, and I don't see why this separator cannot be "overloaded" for use in regexes.
E.g., rather than
m{ ... }xmsgco
or
s{ ... }{...}xmsgeepo
one might write
m{ ... }xms_gc_o
or
s{ ... }{...}xms_geep_o
(just to fabricate some extreme cases). Of course, my own personal practice is always to use an /xms modifier tail, so a separator would always fall after this mandatory group if there were additional modifiers.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
... the use of o seems to be discouraged.
I only latched onto /o because I was casting about for something to use in a manufactured example.
AFAIU, the /o modifier is only useful now in those very limited cases in which one wishes to prevent recompilation of a qr// m// s/// even when interpolated Regexp objects or strings have changed. My understanding is that these operators will not now recompile on each execution unless an interpolated regex/string has changed.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
Re: Best Practice: Order of regex modifiers?
by hippo (Bishop) on Feb 02, 2017 at 09:47 UTC
|
Alphabetical. Surely if you are trying to eye-parse some code and want to know if a particular modifier has been applied, this is the clearest and fastest approach to use.
Perhaps this question could be submitted to the poll ideas quest 2017?
| [reply] |
|
> Alphabetical. Surely if you are trying to eye-parse some code and want to know if a particular modifier has been applied, this is the clearest and fastest approach to use.
I disagree.
First one should separate modifiers which are s/// only from standard m// modifiers (the latter (most?) can also be pre-compiled into the regex using qr// )
Than ordering by (and/or)
- category
- seniority (new vs established)
- frequency²
- memorizing
make sense.
For instance /a /d /l /u are perlre#Character-set-modifiers °
but are mostly listed as /dual for obvious reasons, the word "dual" is far easy to remember.
(I'd even argue that /i belongs to same category but which much higher frequency)
So I'd say divide and conquer, humans can grasp sets with 5 to 7 elements far more easily, so 5 categories with at most 5 elements should fit
(... because of connectivity problems the rest of the post got lost :/ ... TL; don't want to rewrite and posting by tethering thru mobile)
so my bet at the moment is the following order by categories, respecting frequency and memorization
Categories
- Syntax x
- Line m,s
- Matching n,p
- Character i,d,u,a,l
- Operation g,c,(r)
- Substitution-only r, e,ee, o
° not sure why the deep linking doesn't work (for me) seems like the anchor is missing.
² in 5.10 perlre only listed 7 modifiers and already did a categorization: "g and c: Unlike i, m, s and x, these two flags affect the way the regex is used"
| [reply] [d/l] [select] |
Re: Best Practice: Order of regex modifiers?
by kcott (Archbishop) on Feb 02, 2017 at 15:07 UTC
|
G'day Rolf,
Purely for readability, I generally try to keep the modifiers in alphabetical order.
In that respect, I concur with ++hippo's response.
I'm pretty sure that the "xms default" came about, because that was the order they were
introduced in the book "Perl Best Practices".
- Always use the /x flag. (pp. 236-237)
- Always use the /m flag. (pp. 237-239)
- Always use the /s flag. (pp. 240-241)
Some modifiers can be applied to the regex itself (e.g. /x);
others, to any operation the regex is involved in (e.g. /g);
and others to only a specific operation (e.g. /e).
It's a fatal error to use them in the wrong places:
$ perl -E 'say qr{}x'
(?^ux:)
$ perl -E 'say qr{}g'
Unknown regexp modifier "/g" at -e line 1, near "say "
Execution of -e aborted due to compilation errors.
$ perl -E 'say m{}g'
$ perl -E 'say m{}e'
Unknown regexp modifier "/e" at -e line 1, near "say "
Execution of -e aborted due to compilation errors.
$ perl -E 'say s{}{}e'
1
I can see some benefit in keeping those grouped together:
your initial example of xegis would become isxge.
I'm also not averse to ++AnomalousMonk's suggestion of using a separator.
In which case, xegis would become isx_g_e.
Overall, I'm not too bothered by personal preferences regarding modifier ordering:
deciding upon a single style, and using it consistently, is far more important, in my opinion.
| [reply] [d/l] [select] |
Re: Best Practice: Order of regex modifiers?
by choroba (Cardinal) on Feb 02, 2017 at 16:55 UTC
|
| [reply] [d/l] |
|
| [reply] |
Re: Best Practice: Order of regex modifiers? ( s///gexis s///mexig )
by Anonymous Monk on Feb 02, 2017 at 02:48 UTC
|
Hi,
I mostly dont think about it too much, but some things are memorable like
s///gexis
s///mexig
s///gimx
s///gmix
s///gsix
s///gesr
| [reply] |
|
|