<?xml version="1.0" encoding="windows-1252"?>
<node id="645432" title="How scalars work (about numbers, text strings, and binary strings)" created="2007-10-17 08:27:18" updated="2007-10-17 04:27:18">
<type id="120">
perlmeditation</type>
<author id="132236">
Juerd</author>
<data>
<field name="doctext">
&lt;!-- *******************
Note to janitors: I used a pre tag because the diagram gets messed up with line wrapping. Please don't change it to a code tag.
******************* --&gt;


&lt;pre&gt;&lt;tt&gt;
This is a simplified high level view of relations between Perl scalar types.

                             &amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;
                      &amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472; &amp;#9474; REFERENCE     &amp;#9474; &amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;
                      &amp;#9474;      &amp;#9474; (ROK flag on) &amp;#9474;      &amp;#9474;
                      &amp;#9474;      &amp;#9492;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;      &amp;#9474;
      numeric context &amp;#9474;                             &amp;#9474; string context
                      &amp;#9474;                             &amp;#9474;
                      &amp;#9660;                             &amp;#9660;
&amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;                     &amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;
&amp;#9474; NUMBER                  &amp;#9474;    string context   &amp;#9474; TEXT STRING                  &amp;#9474;
&amp;#9474; encoded internally      &amp;#9474; &amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9654; &amp;#9474; (POK flag on)                &amp;#9474;
&amp;#9474; as any of:              &amp;#9474;                     &amp;#9474; encoded internally           &amp;#9474;
&amp;#9474; * integer (IOK flag on) &amp;#9474;   numeric context   &amp;#9474; as one of:                   &amp;#9474;
&amp;#9474; * double (NOK flag on)  &amp;#9474; &amp;#9664;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472; &amp;#9474; * iso-8859-1 (UTF8 flag off) &amp;#9474;
&amp;#9492;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;                     &amp;#9474; * utf8 (UTF8 flag on)        &amp;#9474;
                  &amp;#9474;     &amp;#9650;                       &amp;#9492;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;
                  &amp;#9474;     &amp;#9474;                         &amp;#9650;     &amp;#9474;           &amp;#9650;
                  &amp;#9474;     &amp;#9474;                         &amp;#9474;     &amp;#9474;           &amp;#9474;
             pack &amp;#9474;     &amp;#9474; unpack           decode &amp;#9474;     &amp;#9474; encode    &amp;#9474;
                  &amp;#9474;     &amp;#9474;                         &amp;#9474;     &amp;#9474;           &amp;#9474;
                  &amp;#9660;     &amp;#9474;                         &amp;#9474;     &amp;#9660;           &amp;#9474; :encoding
                &amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;         &amp;#9474; PerlIO
                &amp;#9474; BINARY STRING                           &amp;#9474;         &amp;#9474; layer
                &amp;#9474; (POK flag on)                           &amp;#9474; &amp;#9664;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;
                &amp;#9474; (UTF8 flag off)                         &amp;#9474;
                &amp;#9492;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;
                        &amp;#9650;  &amp;#9650;  &amp;#9650;
                        &amp;#9474;  &amp;#9474;  &amp;#9474;
                        &amp;#9474;  &amp;#9474;  &amp;#9474;
                        &amp;#9660;  &amp;#9660;  &amp;#9660;
                  &amp;#9484;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9488;
                  &amp;#9474; OUTSIDE PERL                &amp;#9474;
                  &amp;#9474; files, sockets, filenames,  &amp;#9474;
                  &amp;#9474; environment, system calls   &amp;#9474;
                  &amp;#9492;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9472;&amp;#9496;

(A Perl programmer does not have to know about the internal flags ROK, IOK,
NOK, POK, and UTF8, but if you're interested read perlguts.)


Keep text and binary strings/semantics separated! (Good style anyway!)

If you don't keep them separate, and use a binary string as a text string, it
is assumed to be iso-8859-1 encoded.

If you don't keep them separate, and use a text string as a binary string, one
of the following things happens, with or without warnings:

  1. the internal iso-8859-1 buffer is used (always the case if the internal
     buffer is not utf8 encoded)
  2. the internal utf8 buffer is used
  3. the iso-8859-1 encoded version is used
     3a. characters above U+00FF are utf8 encoded, while the rest is iso
     3b. characters above U+00FF are modulo'ed 256
     3c. characters above U+00FF are dropped
     3d. characters above U+00FF cause an exception to be thrown

If you do keep them separate, and always explicitly convert between the two
types by explicitly decoding and encoding or using the :encoding layer on a
filehandle, you stay in control of what happens and your program will behave
more predictably.

&lt;/font&gt;&lt;/tt&gt;&lt;/pre&gt;
Update: thin lines used, see discussion below.</field>
</data>
</node>
