note
repellent
You mentioned (emphasis mine):<br>
<br>
<ul>
No <i>interpretation</i> of the meaning (nor even signedness) is placed (nor could be) upon <b>that number</b> until you do something with it!
</ul>
<br>
I agree with this, but I believe we have different assumptions on what is meant by <i>interpretation</i>. Look, I need a way to refer to <b>that</b> <u><i>number</i></u>, because that is fundamental. I call that <u><i>number</i></u> a "<i>character</i>". The value of that <u><i>number</i></u> is what I call the "<i>codepoint value</i>". Bear with me: forget "Unicode" for now, and grant me the use of those words. At any time, you may <c>s/character|codepoint/_that_number_/gi</c>.<br>
<br>
Before that sentence, you mentioned:<br>
<br>
<ul>
It is a byte! An 8-bit bit pattern stored in a 8-bit unit of memory and nothing else.
</ul>
<br>
Well, that <u><i>number</i></u> is <c>255 == ord(pack 'B8', '11111111')</c>. Saying it's a (single) byte means you've established the number of bits for it is <c>8</c>. That, to me, is giving the number an <i>interpretation</i>(*). This observation is very important when it comes to the subject of <b>encoding</b>, especially when we're to print that <i>character</i> (i.e. that <u><i>number</i></u>).<br>
<br>
If you want to <c>print</c> a string, you should avoid any preconceived notion of how many bits the string "has" prior to deciding which encoding to use. I find thinking in terms of <i>characters</i> (i.e. those <u><i>numbers</i></u>) and what their <i>codepoint values</i> (i.e. the <u><i>number</i></u> values) are, helps tremendously in my handling of strings up to the point where they are encoded using <c>print</c>. That is my thought process, and the message I was trying to [id://960923|deliver].<br>
<br>
(*) I am aware of the details of how perl stores that <u><i>number</i></u> in memory, but not as well versed as you. I would like to reiterate that this discussion is about <c>print</c> and encoding, and that the [doc://ord]inal of the character is what matters here.
<br>
<ul>
The important part is that the OS cannot preserve what it has no knowledge of.
</ul>
<br>
Agreed.<br>
<br>
<ul>
There is no concept of encoding attached to the file descriptors.
</ul>
<br>
And that's the thing: the concept of encoding <b>alone</b> does not make sense without the concept of <i>characters</i> (what we're encoding). And those <i>characters</i> can only exist within the process (e.g. <u><i>numbers</i></u> in Perl's "string"). Our computer "systems" (e.g. web browser, text editor, terminal, program, etc.) do this decode-incoming-octets-then-output-octets-already-encoded dance between each other to handoff <i>characters</i>.<br>
<br>
When Perl warns you about "Wide character in print", what it's really saying is: Please be explicit about the encoding so that I can tell the next "system" about <i>my</i> characters accurately, using only octets.<br>
<br>
<ul>
The bottom line -- for this thread, rather than this subthread -- is that the OP must have omitted some details from his scenario.
</ul>
<br>
Agreed.
960809
960928