Re: Demarcate Regexes with Unicode
by moritz (Cardinal) on Sep 16, 2011 at 07:55 UTC
|
On one of my machines, two of the characters you proposed aren't displayed correctly, because there's no font installed that contains them.
As a maintainer of code like that I would be unhappy to be faced with characters that I don't know how to produce with the keyboard.
In my humble opinion, the real problem with regex readability is that people tend to not reuse regexes, so everything is pieced together from the primitives.
I find
use Regexp::Common qw /URI/;
if ($string =~/$RE{URI}{HTTP}/) {
...
}
more readable than any of the alternatives you have offered, and there are no "weird" characters involved.
| [reply] [d/l] |
|
I hadn't heard of Regexp::Common. Awesome! You've just saved me a lot of time moritz, thank you.
| [reply] [d/l] |
Re: Demarcate Regexes with Unicode
by Anonymous Monk on Sep 16, 2011 at 07:07 UTC
|
One lamented aspect of Perl is that regexes get really hard to read. ....
If you learn to regex, they're really easy to read :)
why not use a sigil ...
Because its not on the standard keyboard!
Also, its a delimiter not a sigil
Which is why I prefer to use @ for quoting like s@@@ </sarcasm>
But seriously, this is exactly why i prefer \ or «
$ perl -MO=Deparse -e " s\\\g "
s///g;
-e syntax OK
$ perl -MO=Deparse -e " s«««g "
s///g;
-e syntax OK
or even
</sarcasm>
But seriously, between balanced delimiters like perl -MO=Deparse -e " s {}//g "
perl -MO=Deparse -e " s {}\\g "
perl -MO=Deparse -e " s {}vvg "
perl -MO=Deparse -e " s {}()g "
perl -MO=Deparse -e " s {}[]g "
perl -MO=Deparse -e " s {}<>g "
perl -MO=Deparse -e " s<><>g "
I stick to keyboard characters
s///x
s===x
s,,,x
s!!!x
s~~~x
s>>>x
s}}}x
and the special case s'''x
The x means magic | [reply] [d/l] [select] |
|
its a delimiter not a sigil
Well, it's a section sign, but lawyers sometimes call it Sigil. (I know that namespace is already occupied in this circle.)
Because its not on the standard keyboard!
If you're on a Mac it's quite easy to make, and if you're on Ubuntu it's pretty easy to make. On Windows too, I still remember
Alt + Num0141 from typing Spanish on a US keyboard.
Anyway, I admit this approach is not for everybody. I like your suggestions (but I don't have a key for «). | [reply] |
Re: Demarcate Regexes with Unicode
by JavaFan (Canon) on Sep 16, 2011 at 17:55 UTC
|
What you find easier to read, I see more or less as:
+---+
|01D|
|31C|
+---+
I have my own set of layout rules. For most rules there may be exceptions in which I break them. Two of them, I never break:
- No line shall exceed 80 characters in length.
- No non-ASCII character shall appear in the source code.
| [reply] |
|
You wouldn't use the bullet operator to form a list in a long comment? Say there was a key for it in your system (e.g. you're using a Mac).
| [reply] |
|
Eh, no. An asterisks will do fine, although I usually will use a dash or numbers in such a case. (I've a Macbook, I haven't seen a key for a "bullet operator").
| [reply] |
Re: Demarcate Regexes with Unicode
by hardburn (Abbot) on Sep 16, 2011 at 13:10 UTC
|
My radical idea for regex readability is to integrate tablets into development. Touchscreens can be a convenient way to manipulate a Finite State Machine (which is what regexen are). Once saved, it would create a compiled form of the NFA that can be accessed using some API call (similar to Regexp::Common).
The main difficulty is that Perl's regexen go way beyond just FSMs. How would you represent capturing on the tablet, for instance? I'm also not sure if Perl5's regex engine can be easily given a serialized input.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] [d/l] |
|
| [reply] |
|
Aw, dammit... Coke, keyboard.... :-)
| [reply] |
Re: Demarcate Regexes with Unicode
by DrHyde (Prior) on Sep 20, 2011 at 10:41 UTC
|
NEVER use non-ASCII characters in your source code, not even in quoted text. Why? Several reasons:
- any given machine may not be configured to understand your character set;
- any given machine may not have an appropriate font;
- any given editor may not know how to handle that character set;
- for some characters, users may not be able to see the differences easily (this is no doubt a function of familiarity)
If you need to spit out non-ASCII characters, then they should live in a language-specific resource file. This even applies to code that is only for your own consumption where the bizarro-characters are for your own language, to protect you from the pain of editors that don't know your character set on other peoples' machines, or on mobile devices, or ...
Any use of non-ASCII characters in code is a bug, and any support for non-ASCII characters in code is also a bug because it encourages the writing of buggy code. | [reply] |
Re: Demarcate Regexes with Unicode
by misterwhipple (Monk) on Sep 22, 2011 at 17:27 UTC
|
I like using {}, for two reasons:
- The delimiters act balanced, so they might as well look like it.
- It makes the two parts of an s/// very distinct, visually.
m{rancho}
s{cucamonga}{dressing}
-- Any sufficiently interesting Perl project will depend upon at least one module that doesn't run on Windows.
| [reply] |
Re: Demarcate Regexes with Unicode
by tweetiepooh (Hermit) on Sep 26, 2011 at 13:41 UTC
|
Hmm! Don't think I'll have these on a Solaris console. Even if they were displayable I don't know how to access from a SUN keyboard or from a console session on PC via serial cable.
Not everyone has graphical interfaces. | [reply] |