Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Trying to determine the output length of a Unicode string

by halley (Prior)
on Sep 25, 2011 at 16:17 UTC ( [id://927758]=note: print w/replies, xml ) Need Help??


in reply to Trying to determine the output length of a Unicode string

I think it's kind of annoying that it took this long, but Perl 5.14 seems to be the answer here.

From 'perldoc perlunicode':

Starting in Perl 5.14, Perl-level operations work with characters rather than bytes within the scope of a use feature 'unicode_strings' (or equivalently use 5.012 or higher). (This is not true if bytes have been explicitly requested by use bytes, nor necessarily true for interactions with the platform's operating system.) For earlier Perls, and when unicode_strings is not in effect, Perl provides a fairly safe environment that can handle both types of semantics in programs. For operations where Perl can unambiguously decide that the input data are characters, Perl switches to character semantics. For operations where this determination cannot be made without additional information from the user, Perl decides in favor of compatibility and chooses to use byte semantics.

{Example cut, because perlmonks replaces japanese characters with entities.}

--
[ e d @ h a l l e y . c c ]

  • Comment on Re: Trying to determine the output length of a Unicode string

Replies are listed 'Best First'.
Re^2: Trying to determine the output length of a Unicode string
by Jim (Curate) on Sep 26, 2011 at 01:29 UTC

    unicode_strings only ensures that Perl uses character semantics instead of byte semantics for all string operations, which is helpful in the face of ambiguity. (See The "Unicode Bug" in perlunicode.) It doesn't alter the behavior of the length function, which measures the length of a Unicode string in code points, not in grapheme clusters (that is, in real characters).

    There's no built-in function in Perl to measure the length of a Unicode string in grapheme clusters rather than in code points.

    Read chromatic's article titled New Features of Perl 5.14: unicode_strings for a helpful overview of unicode_strings.

Re^2: Trying to determine the output length of a Unicode string
by Jim (Curate) on Sep 26, 2011 at 01:58 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://927758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-19 01:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found