Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Regex Matching Unicode and Regex Classes

by moritz (Cardinal)
on Nov 02, 2011 at 14:08 UTC ( [id://935405]=note: print w/replies, xml ) Need Help??


in reply to Regex Matching Unicode and Regex Classes

The default Unicode semantics just check the Unicode properties of a codepoint. "ä" is U+00E4 LATIN SMALL LETTER A WITH DIAERESIS and classified as a letter, so \w matches it.

  • Comment on Re: Regex Matching Unicode and Regex Classes

Replies are listed 'Best First'.
Re^2: Regex Matching Unicode and Regex Classes
by McA (Priest) on Nov 02, 2011 at 14:27 UTC
    Hi Moritz,

    but what is then the difference to the third case? Is the "default Unicode semantic" changed to something different when local is enabled?

    Why is "U+00E4 LATIN SMALL LETTER A WITH DIAERESIS" under locale something different than a letter which is part of a word?

    Best regards
    Andreas

      Short answer: because Unicode and locales don't mix.

      Long answer: Perl's support for locales comes from a time before the whole encoding/decoding business and Unicode support. So if locales are active, the locale-sensitive parts expect to act on bytes, not on decoded strings.

      Since the locale is not ISO-8859-1 but UTF-8, encoding to Latin-1 doesn't fix it for you.

      If anything, you'd need to encode to UTF-8 to see the \w matching ä, but even then I don't see it matching. So either my understanding of locales is very wrong, or perl is broken (or a mixture thereof).

        Hi Moritz,

        that sounds plausible, but not satisfying. ;-)

        What is then the right approach to find word boundaries with regex while locale is enabled?

        Best regards
        Andreas

Re^2: Regex Matching Unicode and Regex Classes
by McA (Priest) on Nov 02, 2011 at 15:00 UTC
    Moritz,

    thanky ou for your answers.
    Have a nice day.

    Best regards
    Andreas

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://935405]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2025-06-22 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.