Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

[OT] Database row width and cargo cult programming

by moritz (Cardinal)
on Apr 06, 2011 at 10:25 UTC ( [id://897723]=perlquestion: print w/replies, xml ) Need Help??

moritz has asked for the wisdom of the Perl Monks concerning the following question:

This is a bit off-topic here, because it has no Perl contents, but I still hope you find the question interesting.

I've seen many database column definitions using VARCHAR(255), or other powers-of-two-minus-1 values as the column width. I even caught myself writing this. I presume this is because then length fits into one unsigned byte, and thus think it's significantly less wasteful in terms of space than when using VARCHAR(256).

Is that actually still the case in modern RDMBs? Or are there are other good reasons for chosing numbers of the form 2**$int - 1?

The longer I think about it, the more I expect it to be just cargo cult programming, carried over from the early days of relational databases. After all, when a VARCHAR allows arbitrary Unicode characters, the maximal byte width is four times the character count (for UTF-8, UTF-16 and UTF-32), so a character length of 255 won't lead to a byte length that fits into a byte.

Also I'd expect postgresql, sqlite and consorts to just store the widht in a native (or maybe 64bit) unsigned it.

So, is it a cargo cult? Or is there still a deeper reason for those special widths?

Replies are listed 'Best First'.
Re: [OT] Database row width and cargo cult programming
by mje (Curate) on Apr 06, 2011 at 10:49 UTC

    A number of databases (e.g. Oracle) allow you to specify the size in bytes or characters e.g., "a_col varchar(128 char)" so encoding does not come in to it.

    varchar(255) is probably seen a lot simply because in many databases it was the largest varchar you could have. Even now many databases only support varchars of 2000 or 4000.

    In my opinion, it is mostly cargo cult but I'll wait for someone to come up with a reason. I do it myself quite a bit when needing to hold an error message in a varchar. I know it is not going to be bigger than 255 but I'm too lazy to work out if it could be smaller.

Re: [OT] Database row width and cargo cult programming
by cdarke (Prior) on Apr 06, 2011 at 12:01 UTC
    From the mysql 5.0 reference guide:
    Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
Re: [OT] Database row width and cargo cult programming
by JavaFan (Canon) on Apr 06, 2011 at 10:35 UTC
    After all, when a VARCHAR allows arbitrary Unicode characters, the maximal byte width is four times the character count (for UTF-8, UTF-16 and UTF-32), so a character length of 255 won't lead to a byte length that fits into a byte.
    You got it backwards. ;-) When storing UTF-8 characters in a VARCHAR(255), you may not be able to store 255 characters.

    The size is the byte length. And 255 takes a single byte to store the length, anything above takes 2 bytes.

    Also note that many (var)char columns will not be used to store non-ASCII data. Just because Unicode exists doesn't mean everything requires non-ASCII (or non-LATIN-1) characters.

      You got it backwards. ;-) When storing UTF-8 characters in a VARCHAR(255), you may not be able to store 255 characters

      That's curious. In sql server 2000 it's indeed the byte length, in postgres it's character length. And I just tried it in postgres, I can indeed store 64 characters with codepoints above 127 in a varchar(64) column (encoding is set to UTF-8).wikipedia says "characters"

      Is there any normative source that is explicit about what's "correct"?

      The size is the byte length. And 255 takes a single byte to store the length, anything above takes 2 bytes.

      My real question was: do current RDBMs actually exploit that use only one byte to store the length?

        Is there any normative source that is explicit about what's "correct"?
        The various SQL standards, I presume. But they may leave it as "implementation defined". Anyway, from the Sybase 15 manuals:
        Use n to specify the number of bytes of storage for char and varchar datatypes. For unichar, use n to specify the number of Unicode characters (the amount of storage allocated is 2 bytes per character). For nchar and nvarchar, n is the number of characters (the amount of storage allocated is n times the number of bytes per characer for the server’s current default character set).
        ...
        Use n to specify the maximum length in characters for the variable-length datatypes, varchar(n), univarchar(n), and nvarchar(n). Data in variable-length columns is stripped of trailing blanks; storage size is the actual length of the data entered. Data in variable-length variables and parameters retains all trailing blanks, but is not padded to the defined length. Character literals are treated as variable-length datatypes.
        ...
        Fixed-length columns tend to take more storage space than variable-length columns, but are accessed somewhat faster. Table 1-17 summarizes the storage requirements of the different character datatypes:
        Table 1-17: Character datatypes
        
        Datatype        Stores                        Bytes of storage
        
        char(n)         Character                     n
        unichar(n)      Unicode character             n*@@unicharsize (@@unicharsize equals 2)
        nchar(n)        National character            n*@@ncharsize
        varchar(n)      Character varying             Actual number of characters entered
        univarchar(n)   Unicode character varying     Actual number of characters * @@unicharsize
        nvarchar(n)     National character varying    Actual number of characters * @@ncharsize
        
        I'm a bit surprised Sybase doesn't need to store the length of a stored varchar. The chapter about storage sizes of datatypes repeats the information above.

        MySql defines their storages sizes here. For varchar, the storage size is defined as L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes, where L represents the actual length in bytes of a given string value..

      He didn't say anything about storing 255 characters in a varchar(255). Once again: please keep your straw men to yourself.

      One world, one people

Re: [OT] Database row width and cargo cult programming
by duelafn (Parson) on Apr 07, 2011 at 13:40 UTC

    Probably depends on the DB. For example, the Postgres manual states that there are no performance or space differences between varchar(n) and text except for "a few extra CPU cycles to check the length when storing into a length-constrained column".

    Good Day,
        Dean

Re: [OT] Database row width and cargo cult programming
by Tux (Canon) on Apr 07, 2011 at 14:43 UTC

    For me the reason to do so is a physical limitation of the Unify database, whose absolute limit to CHAR type fields is 255. Unify is more braindead than that: it doesn't even know about VARCHAR type fields (read this, page 125). If you need longer than 255, you'll have to use TEXT (no length) or BINARY (no length). Fortunately, dealing with CLOB/BLOB's in Unify is exceptionally easy.

    All databases suck. One way or another :)


    Enjoy, Have FUN! H.Merijn
Re: [OT] Database row width and cargo cult programming
by sundialsvc4 (Abbot) on Apr 06, 2011 at 17:19 UTC

    Limits such as “255” were hard-and-fast limits for many decades.   They might be somewhat habitual now.   Systems like SQLite, which actually ignore string-length limits even if you specify them (AFAIK...) are extremely new.

Re: [OT] Database row width and cargo cult programming
by Anonymous Monk on Apr 06, 2011 at 10:41 UTC
    Cargo cult, based on historical limitations imitations

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://897723]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-20 02:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found