Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How scalars work (about numbers, text strings, and binary strings)

by Juerd (Abbot)
on Oct 17, 2007 at 12:27 UTC ( #645432=perlmeditation: print w/ replies, xml ) Need Help??


This is a simplified high level view of relations between Perl scalar types.

                             ┌───────────────┐
                      ┌───── │ REFERENCE     │ ─────┐
                      │      │ (ROK flag on) │      │
                      │      └───────────────┘      │
      numeric context │                             │ string context
                      │                             │
                      ▼                             ▼
┌─────────────────────────┐                     ┌──────────────────────────────┐
│ NUMBER                  │    string context   │ TEXT STRING                  │
│ encoded internally      │ ──────────────────▶ │ (POK flag on)                │
│ as any of:              │                     │ encoded internally           │
│ * integer (IOK flag on) │   numeric context   │ as one of:                   │
│ * double (NOK flag on)  │ ◀────────────────── │ * iso-8859-1 (UTF8 flag off) │
└─────────────────────────┘                     │ * utf8 (UTF8 flag on)        │
                  │     ▲                       └──────────────────────────────┘
                  │     │                         ▲     │           ▲
                  │     │                         │     │           │
             pack │     │ unpack           decode │     │ encode    │
                  │     │                         │     │           │
                  ▼     │                         │     ▼           │ :encoding
                ┌─────────────────────────────────────────┐         │ PerlIO
                │ BINARY STRING                           │         │ layer
                │ (POK flag on)                           │ ◀───────┘
                │ (UTF8 flag off)                         │
                └─────────────────────────────────────────┘
                        ▲  ▲  ▲
                        │  │  │
                        │  │  │
                        ▼  ▼  ▼
                  ┌─────────────────────────────┐
                  │ OUTSIDE PERL                │
                  │ files, sockets, filenames,  │
                  │ environment, system calls   │
                  └─────────────────────────────┘

(A Perl programmer does not have to know about the internal flags ROK, IOK,
NOK, POK, and UTF8, but if you're interested read perlguts.)


Keep text and binary strings/semantics separated! (Good style anyway!)

If you don't keep them separate, and use a binary string as a text string, it
is assumed to be iso-8859-1 encoded.

If you don't keep them separate, and use a text string as a binary string, one
of the following things happens, with or without warnings:

  1. the internal iso-8859-1 buffer is used (always the case if the internal
     buffer is not utf8 encoded)
  2. the internal utf8 buffer is used
  3. the iso-8859-1 encoded version is used
     3a. characters above U+00FF are utf8 encoded, while the rest is iso
     3b. characters above U+00FF are modulo'ed 256
     3c. characters above U+00FF are dropped
     3d. characters above U+00FF cause an exception to be thrown

If you do keep them separate, and always explicitly convert between the two
types by explicitly decoding and encoding or using the :encoding layer on a
filehandle, you stay in control of what happens and your program will behave
more predictably.

Update: thin lines used, see discussion below.

Comment on How scalars work (about numbers, text strings, and binary strings)
Re: How scalars work (about numbers, text strings, and binary strings)
by Juerd (Abbot) on Oct 17, 2007 at 15:41 UTC
      In my Firefox (not an obscure browser), this diagram formatted incorrectly, causing the whole column to expand twice as wide as intended to the right. If you include text or diagrams that might screw up formatting, can you please put it in <readmore> tags?

      --
      [ e d @ h a l l e y . c c ]

        Sorry, your browser is broken beyond what I personally care to work around. The post is a mere <pre> tag with text in it, no further formatting applied. If even this breaks, anything might, and we could wrap the entire internet in <readmore>... Please check and repair your setup.

        The 80 characters width is a conscious conservative choice, and the diagram renders correctly on my computer in Firefox, Opera, Konqueror, Internet Explorer, Safari, w3m, Links2, Lynx, Dillo, ... That's all the major browsers and a few obscure ones.

        I had reports of some monospace fonts being not-so-monospaced for unicode linedrawing characters (while especially there, it really matters), so I uploaded the screenshot at http://juerd.nl/files/perlvalues.png. I try my best, but I have to draw a line with regards to how far I want to go, and adding <readmore> tags to hide the most important part of this work is well beyond that line.

        Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: How scalars work (about numbers, text strings, and binary strings)
by ikegami (Pope) on Oct 17, 2007 at 15:54 UTC

    One way to keep binary strings seperate from text strings when you have to deal with both is to use hungarian notation.

    my $bin_msg = ...; my $txt_msg = decode($encoding, $bin_msg);

    See Making Wrong Code Look Wrong for more on the subject.

    (I usually use "bytes" and "chars". I used "bin" and "txt" here to be in line with your diagram.)

      That's a great idea, and I have done it in a few examples. I'd have to fight my old habits in order to do such a thing in actual code, though.

      Maybe it would be easier to do with single-letter prefixes. "u" for Unicode, perhaps. Feels better than "t" or "c" (I read "c" as "count", and would expect a number).

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: How scalars work (about numbers, text strings, and binary strings)
by Joost (Canon) on Oct 19, 2007 at 21:35 UTC

      What program did you use to draw it?

      Vim, with the help of The Unicode Sliderule, and copy/pasting. I used the same thing, plus regexes, to change the thick lines to thin lines.

      I first modelled it on paper, then in Inkscape. It started out as a rather complex diagram, but the end result is, as you can see, quite simple and comprehensible.

      Update: your post inspired me to look for tools that can help, and I found this nifty Vim script: http://www.vim.org/scripts/script.php?script_id=173

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: How scalars work (about numbers, text strings, and binary strings)
by Juerd (Abbot) on Oct 22, 2007 at 12:21 UTC

    Note that in reality, references go straight to binary strings when stringified. This is a bug and likely to be fixed some version after 5.10.0.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://645432]
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-08-28 23:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (275 votes), past polls