Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: About \d \w and \s

by mirod (Canon)
on Oct 18, 2009 at 16:17 UTC ( #801881=note: print w/ replies, xml ) Need Help??


in reply to About \d \w and \s

I think you're right, \d should be strictly equivalent to [0-9]. That's the way it worked pre-unicode, and I suspect a lot of code still uses it this way. The author would be quite surprised to see that their regexp actually matches non-traditional digits, and it could be a potential security problem.

I don't really like the /b, for broken, modifier. Maybe /t (traditional?) or /c (classical) if they aren't already used (I don't believe they are).


Comment on Re: About \d \w and \s
Re^2: About \d \w and \s
by demerphq (Chancellor) on Oct 18, 2009 at 16:27 UTC

    Unfortuantely /c is taken for /gc matches. My problem with "traditional" is that the term "traditional" is how I have been thinking of the /a variant, which makes things match the way perl did before it supported unicode. But maybe the difference is one is "Perl-traditional" and the other is "Perl-with-unicode-traditional". I dont know. Got any other ideas?

    BTW you can see the ones that are taken below:

    /* chars and strings used as regex pattern modifiers * Singlular is a 'c'har, plural is a "string" * * NOTE, KEEPCOPY was originally 'k', but was changed to 'p' for prese +rve * for compatibility reasons with Regexp::Common which highjacked (?k: +...) * for its own uses. So 'k' is out as well. */ #define EXEC_PAT_MOD 'e' #define KEEPCOPY_PAT_MOD 'p' #define ONCE_PAT_MOD 'o' #define GLOBAL_PAT_MOD 'g' #define CONTINUE_PAT_MOD 'c' #define MULTILINE_PAT_MOD 'm' #define SINGLE_PAT_MOD 's' #define IGNORE_PAT_MOD 'i' #define XTENDED_PAT_MOD 'x' #define BROKEN_SEM_PAT_MOD 'b' #define LOCALE_SEM_PAT_MOD 'l' #define PERL_SEM_PAT_MOD 'a' #define UNI_SEM_PAT_MOD 'u' #define ONCE_PAT_MODS "o" #define KEEPCOPY_PAT_MODS "p" #define EXEC_PAT_MODS "e" #define LOOP_PAT_MODS "gc" #define STD_PAT_MODS "msix" #define SEM_PAT_MODS "blau" #define INT_PAT_MODS STD_PAT_MODS KEEPCOPY_PAT_MODS #define EXT_PAT_MODS ONCE_PAT_MODS KEEPCOPY_PAT_MODS #define QR_PAT_MODS STD_PAT_MODS EXT_PAT_MODS SEM_PAT_MODS #define M_PAT_MODS QR_PAT_MODS LOOP_PAT_MODS #define S_PAT_MODS M_PAT_MODS EXEC_PAT_MODS
    ---
    $world=~s/war/peace/g

      Since legacy is the default, I'd expect the flag to be explicitly named only rarely. Use "L" for legacy. If you can't stand the use of the shift key (even rarely), perhaps "h" for "historical."

      (I agree with mirod, the old behavior is not "broken.")

      Update: Maybe describing what it does is too hard. Just call it "d" for "default"!

        I'm more in favour of "h" or "t", as 'l' is used for locale. I'm not sure we want "L" and "l" at the same time.

        But the color of the bikeshed wasn't quite the point, I'm more interested in whether people like the shelves....

        ---
        $world=~s/war/peace/g

Re^2: About \d \w and \s
by demerphq (Chancellor) on Oct 18, 2009 at 17:19 UTC

    I changed it to /t for "traditional" in the source now.

    ---
    $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://801881]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2014-10-23 04:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (124 votes), past polls