I find your definition of byte confusing, and I think most people use it differently. According to your definition,
$x = "\xEC"; utf8::upgrade($x);
now $x consists of a single byte. Even though it requires 16 bits of encoding.

Perhaps the confusion comes from saying that for your definition of a byte, the UTF8 flag doesn't matter, yet it refers to a string element, which is defined in terms of substr, for which the UTF8 flag *does* matter.

I'd say that in my example, $x ends up having 2 bytes, but one character. This is also the difference wc makes.

Of course, you are free to use whatever definition you want -- just do mind that not all people share your definition. Some people prefer not use the term byte at all, just character and octet.


In reply to Re: Jargon relating to Perl strings by JavaFan
in thread Jargon relating to Perl strings by ikegami

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":