Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Well Designed RDBMS

by nothingmuch (Priest)
on Apr 07, 2003 at 14:01 UTC ( #248599=perlquestion: print w/replies, xml ) Need Help??

nothingmuch has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,
I want to reprisent a chunk of data (probably a BLOB), and a set of dynamic attributes (which might not be the same for each record, perhaps with many different attribute possibilities). I was wondering which of the following methods is preferred for such a system:

Attributes in another table - a new table with the following structure is made:
create table attributes ( id int(), # the ID of data chunk record this attribute appl +ies to attr char(10), # the attribute name value char(10), # the attribute value primary key (id, attr) );
The advatage of doing this is mainly that you can look up the attributes more easily, remove and add them in a simple manner, and it's more with the spirit of databases. The downside is that it will probably result in many more reads / seeks / whatever, to load the attributes, than

Attributes as a blob - store some kind of serialized (packed or Storable'd) data structure in another blob, on the same row as the data, which contains the attribute name / data pairs all mushed together. This means that we can't (easily) search by attributes, but loading of the attributes will probably be much quicker. If the attribute set grows large there is more redundance in reading / writing of data, but that probably won't happen.

Which would you guys probably use? Should I consider this based on the database server aswell, or will it not make that much of a difference? Should i not even consider such backwards thinking and simply create a column for each possible attribute? Are there &lsb;dis&rsb;advantages i haven't considered and should? Is it too redundant to hybrid the two? Or is that even more backwards?

Thanks in advance,

zz zZ Z Z #!perl

Replies are listed 'Best First'.
Re: Well Designed RDBMS
by hardburn (Abbot) on Apr 07, 2003 at 14:10 UTC

    Would you ever want to do SELECTs based on those fields? If so, you shouldn't put them in the BLOB. Say you had field 'foo' inside the blob. How do you write a SELECT statement to get it out? You can use LIKE clause, but that makes the database do a lot more work, and you'll probably get a some stuff back that you don't care about (depending on the exact structure of the data).

    Try to make seperate tables with similar data and JOIN them. That way, you don't waste space on fields you rarely use.

    Remember, the 'R' stands for 'Relational'. Don't treat an RDBMS like a flat-file with a weird API.

    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      Selecting will probably more oriented toward predetermined records, but at times i may want to retreive by attribute. I'm worried that the occasional lookup by an attribute will not justify two selects, one for the data and one for the attributes. The question - i guess - is which is more of a penalty - using LIKE or selecting again if i need the attributes... I think you both answered my question... =)

      zz zZ Z Z #!perl
        You might also find helpful doing a 2-table query if you really want to avoid doing them seperately. Here's a truncated example with postgres, with appropriate table definitions. Hope this helps
        A well-designed relational database should be normalized, and most of these examples are not because they mix different types of information in the same table. You also are forcing all attributes to be coerced to strings. Here is another approach (supposing Color and Size are attributes some blobs have and not others). You can add as many attributes as you like by adding tables.
        Blob blob_id INT (primary key) blob_data BLOB Color blob_id INT (foreign key to Blob) color_value COLOR Size blob_id INT (foreign key to Blob) size_value FLOAT
        Searching on attributes (especially single attribute values) should actually be more efficient this way. There's a single type of index and an easy JOIN operation (which you should let the DBMS do for you instead of making two queries).
Re: Well Designed RDBMS
by Improv (Pilgrim) on Apr 07, 2003 at 14:11 UTC
    We can't really make the decision for you. You've identified the issue -- is it more important to be able to do queries on attributes with reasonable time, or is it more important that the attributes be flexible to the extent that you don't need to change tables? Your question about the database server is important though -- if you go with a database where you can't 'ALTER TABLE foo DROP COLUMN barattrib', like older versions of postgres, you're not going to want attributes to be a table. I would recommend against a hybrid system -- if you go that way, you need to worry about the data stores getting out of sync. Finally, don't worry about the performance too much -- BLOBs and other large formats are likely to be a bigger performance hit than normal table access, which is much more of an optimized, well-studied area of database implementation. In sum, go with an attribute table unless you feel you have a good reason to do otherwise.
Re: Well Designed RDBMS
by dga (Hermit) on Apr 07, 2003 at 14:56 UTC

    If you are only going to query the attributes rarely and want flexibility, personally I woud probably use a seperate table.

    The main table would have something to uniquely identify the blob and the other table would be like:

    create table attribute_table ( blob_uni int, attribute text, value tex +t ); create unique index a_t_uni on attribute_table ( blob_uni, attribute ) +;

    If I were going to have a lot of attributes or they were going to have long names, I would make a 3rd table with attribute_id and attribute_name then use the attribute_id in the attribute_table. Also a varchar field may be handy instead of text on the attribute name field if a combined table is used.

    When a lookup by attribute is wanted would be the only time this table is referenced and also it will be a fast lookup and much cheaper than a LIKE on a sizeable piece of text. The unique index will speed this up and prevent duplicate attributes of the same kind on a single blob.

    select * from blob_table where id IN ( select blob_id from attribute_t +able where attribute='color' and value='red );

    This design doesn't require the attributes to be known at design time and they can be added or removed at any time and attributes which do not apply to a given blob use no space at all.

Re: Well Designed RDBMS
by zakzebrowski (Curate) on Apr 07, 2003 at 14:59 UTC
    Like every other db decision... Maybee. The one main reason why you want to do this is for when you do not know the attributes in advance. (If you do not know the API, then a big placeholder for variables is a good and reasonable approach.) However, I do not recommend this approach when possible since it will signfigantly hinder the ability to access the stored variables later, since parsing is required. (Which, when you are dealing with n>(some number here) greatly place a cpu burden on your machine.)) cheers.

    Pluralitas non est ponenda sine neccesitate - mysql's philosphy
Re: Well Designed RDBMS
by htoug (Deacon) on Apr 08, 2003 at 10:54 UTC
    One of the things that annoys me about many common RDBMSs is that they do not support the concept of domain.
    Dom... what?
    Domain the set of possible values for a given data element (ie column) - remember that RDBMSs are all about set theory - relation are interscetionsbetween sets.
    In your case the domain for the attribute values are all given as "whatever can be represented in 10 characters", which probably is not the case. Your attributes will have different domains (sets of possible values) - even though you would encode them all in a 10 character wide text column.
    Handling domains with care, ie. taking care not to store different domains in the same column, not joining columns that have diffent domains and so on; ie. in general not comparing apples and oranges and not storing cats and apples in the same keg; simplifies the care and maintainance of your database - just like in all programming.
    In your case I would look carefully at the data, consider what changes are likely, possible, unlikely, improbable and impossible - and try to make a clean database design based on those factors. Trying to optimize access to the data beforehand nearly always causes more trouble later - you know "measure first, tune later".
    Try to get hold of a good textbook on vendor neutral database theory (ie not a manual for your RDBMS) and read on normalisation, optimisation and so on. Database design is a fine art, that can only be learned by correcting your own mistakes ;-)
      From a technical point of view, that's a Good Point&tm;.

      MySQL supports these, or if i understood right, part of what these can be in a pascal type record style. But you need to alter the table each time you need to add a new attribute type. bummer. You have a 16 bit limit on the number of constantly set values, and that's about it.

      From a conceptual point of view, it's a marvellous point, and a definate ++.

      I will keep in mind what you said, and finally get me a second book about programming. or something. damn, they cost a fortune. bah.

      Thanks for your time... =)

      zz zZ Z Z #!perl
      I don't see any difference in the way you talk about the domains and the general programming concept of type. You can have more fine grained types - it's just a kind of tradeoff.
Re: Well Designed RDBMS
by crouchingpenguin (Priest) on Apr 08, 2003 at 11:41 UTC

    Consider seperate tables that are joined into an attribute view as needed. This satisfies the 'remove and add them in a simple manner' as well as having your data normalized in a searchable format.

    I like blobs for throw away data like session info.

    "Never be afraid to try something new. Remember, amateurs built the ark. Professionals built the Titanic."

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://248599]
Front-paged by diotalevi
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2021-07-25 10:50 GMT
Find Nodes?
    Voting Booth?

    No recent polls found