Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: [OT] Database row width and cargo cult programming

by JavaFan (Canon)
on Apr 06, 2011 at 10:35 UTC ( [id://897724]=note: print w/replies, xml ) Need Help??


in reply to [OT] Database row width and cargo cult programming

After all, when a VARCHAR allows arbitrary Unicode characters, the maximal byte width is four times the character count (for UTF-8, UTF-16 and UTF-32), so a character length of 255 won't lead to a byte length that fits into a byte.
You got it backwards. ;-) When storing UTF-8 characters in a VARCHAR(255), you may not be able to store 255 characters.

The size is the byte length. And 255 takes a single byte to store the length, anything above takes 2 bytes.

Also note that many (var)char columns will not be used to store non-ASCII data. Just because Unicode exists doesn't mean everything requires non-ASCII (or non-LATIN-1) characters.

Replies are listed 'Best First'.
Re^2: [OT] Database row width and cargo cult programming
by moritz (Cardinal) on Apr 06, 2011 at 11:55 UTC
    You got it backwards. ;-) When storing UTF-8 characters in a VARCHAR(255), you may not be able to store 255 characters

    That's curious. In sql server 2000 it's indeed the byte length, in postgres it's character length. And I just tried it in postgres, I can indeed store 64 characters with codepoints above 127 in a varchar(64) column (encoding is set to UTF-8).wikipedia says "characters"

    Is there any normative source that is explicit about what's "correct"?

    The size is the byte length. And 255 takes a single byte to store the length, anything above takes 2 bytes.

    My real question was: do current RDBMs actually exploit that use only one byte to store the length?

      Is there any normative source that is explicit about what's "correct"?
      The various SQL standards, I presume. But they may leave it as "implementation defined". Anyway, from the Sybase 15 manuals:
      Use n to specify the number of bytes of storage for char and varchar datatypes. For unichar, use n to specify the number of Unicode characters (the amount of storage allocated is 2 bytes per character). For nchar and nvarchar, n is the number of characters (the amount of storage allocated is n times the number of bytes per characer for the server’s current default character set).
      ...
      Use n to specify the maximum length in characters for the variable-length datatypes, varchar(n), univarchar(n), and nvarchar(n). Data in variable-length columns is stripped of trailing blanks; storage size is the actual length of the data entered. Data in variable-length variables and parameters retains all trailing blanks, but is not padded to the defined length. Character literals are treated as variable-length datatypes.
      ...
      Fixed-length columns tend to take more storage space than variable-length columns, but are accessed somewhat faster. Table 1-17 summarizes the storage requirements of the different character datatypes:
      Table 1-17: Character datatypes
      
      Datatype        Stores                        Bytes of storage
      
      char(n)         Character                     n
      unichar(n)      Unicode character             n*@@unicharsize (@@unicharsize equals 2)
      nchar(n)        National character            n*@@ncharsize
      varchar(n)      Character varying             Actual number of characters entered
      univarchar(n)   Unicode character varying     Actual number of characters * @@unicharsize
      nvarchar(n)     National character varying    Actual number of characters * @@ncharsize
      
      I'm a bit surprised Sybase doesn't need to store the length of a stored varchar. The chapter about storage sizes of datatypes repeats the information above.

      MySql defines their storages sizes here. For varchar, the storage size is defined as L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes, where L represents the actual length in bytes of a given string value..

        For Sybase ASE, varchar() will be the size of the number of bytes entered. If the server charset is utf8 then the number of characters will be less or equal to the number of bytes. For nvarchar() you have the same behavrior, modulo @@ncharsize.

        Michael

Re^2: [OT] Database row width and cargo cult programming
by anonymized user 468275 (Curate) on Apr 06, 2011 at 12:05 UTC
    He didn't say anything about storing 255 characters in a varchar(255). Once again: please keep your straw men to yourself.

    One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://897724]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-03-29 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found