Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: incremental reading of utf8 input handles

by The Code Captain (Initiate)
on Jul 09, 2012 at 18:13 UTC ( #980736=note: print w/replies, xml ) Need Help??

in reply to incremental reading of utf8 input handles

I don't think this is a problem - depending on which version of perl you are using, and provided that you are consistently using UTF8 in all code. (You don't have to use the same language but you do have to use the same character set.)

From perlunicode:

Beginning with version 5.6, Perl uses logically-wide characters to represent strings internally. Starting in Perl 5.14, Perl-level operations work with characters rather than bytes within the scope of a use feature 'unicode_strings' (or equivalently use 5.012 or higher). (This is not true if bytes have been explicitly requested by use bytes, nor necessarily true for interactions with the platform's operating system.)

Whenever I have used UTF8 I have not had a problem with buffers splitting, because perl itself knows that the buffer holds characters, and how many bytes are required to represent the character. Just make sure that you are consistently using the UTF8 character set.

  • Comment on Re: incremental reading of utf8 input handles

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://980736]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2018-06-24 22:25 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.