http://www.perlmonks.org?node_id=980736


in reply to incremental reading of utf8 input handles

I don't think this is a problem - depending on which version of perl you are using, and provided that you are consistently using UTF8 in all code. (You don't have to use the same language but you do have to use the same character set.)

From perlunicode:

Beginning with version 5.6, Perl uses logically-wide characters to represent strings internally. Starting in Perl 5.14, Perl-level operations work with characters rather than bytes within the scope of a use feature 'unicode_strings' (or equivalently use 5.012 or higher). (This is not true if bytes have been explicitly requested by use bytes, nor necessarily true for interactions with the platform's operating system.)

Whenever I have used UTF8 I have not had a problem with buffers splitting, because perl itself knows that the buffer holds characters, and how many bytes are required to represent the character. Just make sure that you are consistently using the UTF8 character set.