Re: How is perl able to handle the null byte?

by hobbs (Monk)
on Jun 15, 2006 at 20:33 UTC ( #555613=note:

in reply to How is perl able to handle the null byte?

To say that "C treats a null byte as the end of a string" is a bit unfair. C barely knows what a "string" is. A large amount of C code, including a great number of standard library functions, work with null-terminated strings. But that doesn't mean you can't write code that treats your own data however you like. Nothing is forcing you to stop at a null; there's just a certain class of pre-written functions that do so by convention. If you know better (because, for example, your string is stored together with its length), then that's fine and well. C only cares about bits and bytes.
Re^2: How is perl able to handle the null byte?
by Joost (Canon) on Jun 15, 2006 at 21:07 UTC
    Quite true. Strictly speaking, C doesn't have a string type: what's conventionally used instead is a pointer to a (single) character which is equivalent to an array of characters because of the way C arrays work 1]. The "string type" in C is literally "char *".

    1] C arrays do not really have a length either, defining an array with a certain length only reserves that amount of memory, the length isn't stored anywhere.

      Reminding me of a classic exchange from a CS class I took once (OK, the CS class I took...):

      Student (slightly paraphrased): you mentioned that the address after the last member of the array is guaranteed to be a legal address, though you don't technically have it allocated to you. Don't a lot of people use that fact to just pretend their array indexes are 1-based instead of 0-based?

      Professor: Lot's of people J-walk, too! Some of them get killed!

      Ah, those were the days... ;-)

      If God had meant us to fly, he would *never* have given us the railroads.
          --Michael Flanders

[LanX]: my main problem will be to cnvince my colleagues that our productive code is broken oO ... so in the end I will just make a workaround :-/
LanX hates UTF8 for causing knots in his brain and stomach
[Corion]: LanX: Yes, that's the main problem - you have lots (and lots) of workarounds in various places and stages of the processing, and to clean that mess up requires action across the complete codebase. And it's almost impossible to do it piece-by-piece

