|
|
| XP is just a number | |
| PerlMonks |
Comment on |
| ( #3333=superdoc: print w/ replies, xml ) | Need Help?? |
However, if one of the many buffers involved (remote libC, remote kernel, remote sshd, remote TCP stack, switch, local TCP stack, local kernel, local ssh, local libC, AnyEvent's sysread) manages to split a UTF-8 character, there is the concern that the utf8 layer will not handle this Do I read "concernt that the utf8 layer will not handle this" correctly as "you are worried, but haven't observed the problem so far"? I for one would not be concerned unless the problem really occured, and trust perl's IO layer. In fact I've made a very simple test for this situation:
This splits the å into two bytes, writes the first, sleeps a second, and then writes the second byte plus a newline. The perl process reading from the pipe decodes the input as UTF-8 (that's what the -CS does), and prints it to STDOUT again. Works fine.
The regex doesn't look right to me. If you have a character that is encoded as three or more bytes, the [\xc0-0xff][\x80-\xbf]+ part could match only the first two bytes, and you wouldn't detect if the third was missing. In reply to Re: incremental reading of utf8 input handles
by moritz
|
|