Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
However, if one of the many buffers involved (remote libC, remote kernel, remote sshd, remote TCP stack, switch, local TCP stack, local kernel, local ssh, local libC, AnyEvent's sysread) manages to split a UTF-8 character, there is the concern that the utf8 layer will not handle this

Do I read "concernt that the utf8 layer will not handle this" correctly as "you are worried, but haven't observed the problem so far"?

I for one would not be concerned unless the problem really occured, and trust perl's IO layer.

In fact I've made a very simple test for this situation:

$ perl -MEncode=encode_utf8 -wE '$| = 1; my $buf = encode_utf8 chr(0xe +5); print substr($buf, 0, 1); sleep 1; say substr($buf, 1)' | perl -C +S -pe 1

This splits the into two bytes, writes the first, sleeps a second, and then writes the second byte plus a newline. The perl process reading from the pipe decodes the input as UTF-8 (that's what the -CS does), and prints it to STDOUT again. Works fine.

$buf =~ s/^((?:[\x00-\x7f]+|[\xc0-0xff][\x80-\xbf]+)*)//; my $newtext = $1; utf8::decode($newtext); $text_so_far .= $newtext;

The regex doesn't look right to me. If you have a character that is encoded as three or more bytes, the [\xc0-0xff][\x80-\xbf]+ part could match only the first two bytes, and you wouldn't detect if the third was missing.

In reply to Re: incremental reading of utf8 input handles by moritz
in thread incremental reading of utf8 input handles by Tanktalus

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the leaves swirl about...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (3)
    As of 2018-03-17 07:22 GMT
    Find Nodes?
      Voting Booth?
      When I think of a mole I think of:

      Results (223 votes). Check out past polls.