|Perl Monk, Perl Meditation|
OT: Data Warehousing Strategiesby Booger (Pilgrim)
|on Aug 27, 2006 at 10:25 UTC||Need Help??|
Booger has asked for the
wisdom of the Perl Monks concerning the following question:
The company I work for is considering undergoing a fairly aggressive redesign of one of our product lines and I have come upon a design issue that I would appreciate some input on.
The product is something that we host and provide to multiple clients. Most of the functionality for each client is similar if not identical and up until now our databases have been organized such that each client has its own set of tables to store application data in (the columns in one set of tables are identical to columns in another set of tables - we've just duplicated the tables for each client).
To be honest I'm not a DBA but I've been around a number of large-scale databases in my time and I think this design is broken and it makes administering the database a real pain since you have to use an automated script to modify each and every table (there are a lot of them) when you want to extend or make a change to the way the application works. This design also causes some fairly ugly code because we have to take into account a clients "table" rather than just a client_id in a table shared by all clients.
One of the challenges in our redesign is reconciling all of this data into a unified set of tables (such that we no longer have tables like "client123_contacts" or "client123_products" but just "contacts" and "products"). I think this will go a long way to giving us a more compact and consistent database.
Here's the bit that I'm scratching my head about...
One of the stated goals of the redesign is to create a system whereby clients can "extend" the database to store bits and pieces of data that are specific to their situation (we will provide a specific set of generic extensions). For example, we might provide a table called "produce" that stores various bits of data about the produce you offer at your store. If you are the client you might want to extend our produce offering buy storing off "produce attributes" that we don't provide for in our default offering (such as sale prices specific to different kinds of produce - or whatever). You would do this by assigning the sale price to an available column (in this case a double) in the extensions table.
The trouble I'm having is that each client will want to extend their data set differently. Some will want to use the extension columns we provide for one set of data while another will want to use the extension columns for a different set of data. The relationship between the default "produce" table we provide and the extensions is always a 1-to-1 but the actual data in the extensions table will differ between clients.
There is a business layer above the data layer that allows us to define a how the data is presented within the application, untainted, ordered on the screen and so forth (this same business layer is what allows clients to extend their default data set offering - keep in mind that they do not actually change the schema but can select from a default set of columns to store their client-specific data in).
What I'm not sure about and would like some input on is whether I should create a unified set of tables in one database (i.e., all clients use the same set of tables within each database - this means that each client uses the same set of extension tables but would store different data for different purposes within those extension tables and thereby possibly violating 2NF but we try to design the default tables such that this isn't a usually problem) OR if I should create set of tables in client-specific databases (whereby each client will have their own database, the schemas in each database being a duplicate of the others). The latter option might not get me very far because I wouldn't really be gaining anything in terms of ease-of-administration.
I may not be giving you enough information about our problem domain, please let me know if there is something I'm leaving out.
So, what would you do in this sort of situation? (Besides run in circles, scream and shout).
There is one more thing that may further complicate the issue: some of our clients are related in a hierarchal fashion and share responsibility over their data with collective data filtering "upwards" in a predefined hierarchy (e.g., if you look at a chain of stores you can see an overall list of products and quantity of products vs. those products only available at a single store - it depends on your perspective).
Thanks for your input guys!