Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Typeless Relational Database

by tomazos (Deacon)
on Jul 01, 2005 at 03:38 UTC ( #471542=perlquestion: print w/replies, xml ) Need Help??

tomazos has asked for the wisdom of the Perl Monks concerning the following question:

One of the things that I love about Perl is that the value of a scalar can be an integer, a float, a 20 character string or a 2MB string - and as its value changes between these, the conversions are done automatically and transparently. It's "typeless".

Sure, its not as efficient as if the compiler had hints about what to expect - but for many applications its fine, and the extra few milliseconds the interpreter wastes is made up for by the programmer not having to think about type conversions, checking and compatibility.

So why can't I have the same thing from my database? Every database I've seen has been an SQL relational database. You have to take a long time specifying column types and what not. Why can't we have a typeless relational database which stores the columns in the same way as perl scalars?

Or is there already one? Or am I missing something?

Come to think of it. What mature CPAN modules are there that implement a non-SQL database of any kind, relational or not?

-Andrew.


Andrew Tomazos  |  andrew@tomazos.com  |  www.tomazos.com

Replies are listed 'Best First'.
Re: Typeless Relational Database
by dragonchild (Archbishop) on Jul 01, 2005 at 03:48 UTC
    The early versions SQLite didn't make much differentiation between strings and numbers. Due to performance reasons, that kind of intelligence was built into SQLite pretty quick.

    I would posit that you're not quite sure what you're looking for from a database. Those annoying types are actually a very good thing, kinda like those annoying foreign keys. Each additional constraint you have to deal with when changing the data means one more assumption you can make when querying that data. The more assumptions you can make when querying, the faster the queries can go and the more confidence you have that the results are accurate.

    Remember - SELECTs generally make up over 99% of a database's activity. If you can improve the cost of a SELECT statement by 1% taking a 10% hit on each modify statement, that's a net gain of quite a lot. And, you have greater confidence in your data. That's a pretty good tradeoff to me.

    As for type conversions ... if you're doing type conversions, either your schema is wrong or you're not using it correctly. You should never ever have to use a type conversion, not even once. And, yes, that's a hard rule.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      As for type conversions ... if you're doing type conversions, either your schema is wrong or you're not using it correctly. You should never ever have to use a type conversion, not even once. And, yes, that's a hard rule.

      That's going a bit too far, don't you think? I'll try to think of a really good example... What about something like this, maybe:

      select ... from table where table.day_in_month = to_number(to_char(sys +date,'dd'))
      Should I not be allowed to represent a day-in-month as a number? Should I, instead, be forced to use some sort of crazy "magic-number" reference month and year to store my day-in-month column, so that I can store it in a date datatype, and be able to do purely (but still crazy as all get-out) date-arithmetic to compare dates to the day-in-month? Imagine that the day-in-month is a piece of data used in representing a "recurrence" data-structure (like, "run this report on the 5th day of every month"), so I'm actually NOT talking about something that is a date... I'm talking about something more abstract, which can most easily be represented by type-casting (on top of some other logic).

      Here's another kind of crazy (but real! I swear!) example: Another, really powerful use of type-casting in SQL comes up in a very fast (but admittedly ugly) way to pull data in one field by correspondance to data in another field. By that I mean, say, give me the X that corresponds to the min(Y). The specific case when I've used this with is trying to pull the earliest created value of something (say FOO is a number, for this example, and you want the FOO corresponding to the min(CREATED)):

      select to_number( substr( min(to_char(CREATED, 'yyyymmddhh24miss') || FOO)), 15, 400 ) ) from table where ...
      If that's a little hard to understand, that's not too surprising, as it's something analogous to the GRT, but for seeking a min or a max, rather than for sorting (as the GRT is just a special case of the ST for sorting). That is: pack the carrier data (CREATED) together with the payload data (FOO), in such a way as comparisons of the packed carrier+payload correspond to any comparisons made against the naked carrier (i.e. if rowX.CREATED < rowY.CREATED, then to_char(...rowX...) < to_char(...rowY...), accordingly). Then, use comparisons made against the packed carrier+payload combo, which will yield back the data in which you're interested, but as packed carrier+payload. And then, finally unpack the payload from the carrier (i.e. the to_number(substr(...)) construct).

      Anyway, I know it's esoteric, but it is real, and it's WAY more efficient than the more purist SQL:

      select FOO from table where CREATED = (select min(FOO) from table where ...) and ...
      particularly if there's some interesting stuff in that "..." part of the query (which has to be repeated in the sub-query). (Oh, and yes, I did purposefully gloss over a few details in that example, that weren't vital to getting the point accross.)
      ------------ :Wq Not an editor command: Wq
        select ... from table where table.day_in_month = to_number(to_char(sysdate,'dd'))
        SELECT ... FROM table WHERE DAYOFMONTH( NOW() ) = table.day_in_month

        Your RDBMS should provide you with sufficient functions so as to make this unnecessary. And, though I don't have the reference in front of me, I'm pretty sure that Oracle does, just like MySQL and PostgreSQL both do.

        Your second example is an optimization. Those are deliberate breakings of good practices (that reduce developer time) in order to gain in some other area (such as processor time). It's the classic tradeoff.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I agree that typing is a good thing for some applications, however I think the arguments against requiring typing in Perl could be made for a database. Yes, having constraints on the data that goes into the database is good for efficiency, reliability and performance - hometimes I'd like to optimize for development time.

      For example, a small unimportant system that isnt going to hold much data, or a prototype that needs to be rapidly developed with a small investment in developer time.

      A typeless relational database has (would have) applications.

      And as for never having to use type conversions, what about shortening a string, or converting a numeric string to an int. Or making sure that the number is low enough to fit in a byte. etc etc.


      Andrew Tomazos  |  andrew@tomazos.com  |  www.tomazos.com
Re: Typeless Relational Database
by Cody Pendant (Prior) on Jul 01, 2005 at 03:59 UTC
    So why can't I have the same thing from my database? Every database I've seen has been an SQL relational database. You have to take a long time specifying column types and what not.
    One almost-too-obvious answer would be "because the database has to read from and write to a disk sooner or later". Your Perl variables are being managed in RAM, and Perl will do whatever it can for you until it runs out of RAM altogether.

    The DB on the other hand, is reading and writing on a HD somewhere, and its function is to manage that reading and writing for you, so you don't have to worry about it. The more a DB "knows" about the kind and size of the information it's dealing with, the more efficient it can be.



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
      I'm not sure I see the difference between reading and writing to RAM and reading and writing to the HD in this regard.

      At the end of the day the information is stored as a sequence of bytes. Whatever technique Perl uses to store a $scalar in RAM could be used in the same fashion to store it on the HD. Could it not?


      Andrew Tomazos  |  andrew@tomazos.com  |  www.tomazos.com
        In theory, yes; in practice, no.

        Reading/writing from/to RAM is many times faster and reorganizing the storage in RAM is much easier.

        In order to get any kind of acceptable speed out of your HD, lots of tables need to be updated and saved to allow fast access to the data.

        In RAM-storage, in a pinch you can simply walk through your RAM to find the data you need, if you do that on a HD, you can go out and get yourself a coffee (and a donut or a toasted bagel) and return before your data is found.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Typeless Relational Database
by CountZero (Bishop) on Jul 01, 2005 at 05:36 UTC
    If you access your (SQL/relational/typed) database with DBI/DBD, you really get (sort of) a typeless database.

    Read a record with a "string" field containing only figures, put it in a scalar, multiply it by something, store it again in the scalar and save it in the database: the data start as a string, gets silently converted to an integer or float (as need be) and gets saved as a string again. Perl has taken care of all type-conversions.

    So what are we complaining about? The small effort of setting-up the database? I actually find that a Good Thingtm as it makes you think about your data, but YMMV.

    And if you want to skip even that little effort, declare all your fields to be 'VARCHAR' (max. length 255) or 'TEXT' (max. length 65,535) in MySQL.

    One single type to rule them all and in the database bind them! Looks pretty (and) typeless to me.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Yes, it'll work... and if you use it for any extended period of time, you'll most likely never use it again. (and prove that strict typing can be a good thing)

      I've had to deal with tables where someone had dne this -- every field was a varchar(255), completely denormalized. (it was basically just logging information from sitescope).

      I ran into the following problems:

      • Any sort of selects were slow.-- Every field was larger than it had to be (strings for dates, strings for numbers, etc.) as it could read fewer records per disk access
      • Because the fields weren't fixed length, anything with a 'where' clause couldn't just know that field 6 started at byte 24 ... it'd have to look through fields 1-5.
      • Inequality matches were abysmal. SQL doesn't have seperate 'lt' and '<' operators, so you had to do the conversions by hand...and you'd have to parse all of the date strings every time.
      • No sanity checking on values -- because you could write anything into any field, you can't catch bad data on insert -- you find the problems when you go to read.

      There were probably other problems, that I either never found, or have managed to block out from that horrible time. Yes, this allows you to start collecting data with little thought, but it makes it a pain to deal with the data in the future -- it's like saving any type of file to a folder, without any markings if it's an image, text file, or some other format -- writing is just fine, but it's much more work when you have to go to actually read them.

      It was so horrible to deal with, and as I couldn't change the application that was writing, I found it easiest to move the data from the bad table to something more compact, with typing.

      So, to anyone who thinks strict typing is a problem -- try this, as an experiment. If it works for you, fine. If it doesn't, well, then get rid of it as quickly as possible.

Re: Typeless Relational Database
by (anonymized user) (Curate) on Jul 01, 2005 at 13:32 UTC
    In an RDBMS, join optimisation relies on having fixed length fields (so it can build a cross-product in memory). A perl scalar for the reasons you point out can't be fixed length and so it would be harder (but not impossible) to write a perl-like RDBMS which delivered reasonable performance.

    One world, one people

Re: Typeless Relational Database
by herveus (Parson) on Jul 01, 2005 at 13:33 UTC
    Howdy!

    SQLite has not discarded the flexibility, although it does now store integers as integers, it still permits you to store arbitrary data in a column. Most (every other?) RDBMS uses container typing (the column is declared with a type and you have to conform), but SQLite uses value typing (where the type is associated with the value, much the way Perl scalars work).

    SQLite is that "typeless" RDBMS you are looking for.

    yours,
    Michael
Re: Typeless Relational Database
by samtregar (Abbot) on Jul 01, 2005 at 21:09 UTC
    MySQL is pretty close, particularly 3.x. Just make all your columns VARCHAR(255) and when you treat them as numbers MySQL will autoconvert. If you want to allow for bigger fields just make them all TEXT, which should work (although I've never tried treating a TEXT field as a number).

    I agree that a typeless relational database would be a very useful thing to have. I might not use it for the most critical projects but it would make one-offs a lot easier to develop.

    -sam

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://471542]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2020-07-11 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?