Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I am currently working on fixing some problems with the current rules for what \d \s and \w should match. It turns out that the current definition/rules lead to logical inconsistencies in the regex engine which cannot be resolved without changing the definitions, and thus breaking something out there.

Unfortunately however, the current behaviour is really close to what people expect: almost all of the time the rules DWIM's nicely. It is only on edge cases, and certain consistency checks do things fall down. This means that any "fixing" of the default rules causes a lot of stuff to break. Which in turn means that we have to do with by adding new modifier flags to control things and leave the defaults alone pretty much.

I am currently working on adding the following set of mutually exclusive flags and behaviour.

Modifier Semantics \w \s \ +d /u Unicode \p{IsWord} \p{IsSpace} [ +0-9] /a ASCII/Perl [A-Za-z0-9_] [ \t\r\n] [ +0-9] /b Broken/Legacy same as perl 5.8 [ +0-9] /l "use locale" same semantics as under use local +e in 5.8.x

Most of this is pretty much a given. The main question is \d under the /b modifier (which will likely be the default). I think it makes a lot of sense to change the default of \d to only be the "computing digits" and not "any digit in unicode". I think it is likely to fix more things than it will break. For you out there working in non-english/latin how much do you depend on \d matching your native digits?

Relevent links: Regarding the new \w regexp escape in 5.11

---
$world=~s/war/peace/g


In reply to About \d \w and \s by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (5)
    As of 2014-07-29 00:05 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (210 votes), past polls