database design question

nop has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: database design question by steves (Curate) on Feb 07, 2003 at 12:29 UTC
I manage this exact scenario for 20-30 partners. In my case the source data is essentially all the same -- they want our content sliced and diced in a way they've defined. They define what they want (which tables/columns of ours to query), what to call it (their tags), and what format they want it in (XML, delimited, CSV, fixed width, etc.). I built a tool set that does data transforms using an input/filter/output model. The inputs and outputs are primarily wrappers around existing CPAN modules: e.g., database tables using DBI; Excel using Spreadsheet::ParseExcel and Spreadsheet::WriteExcel; XML using a templating system we wrote around several existing Perl XML systems; etc. -- you get the idea. The core code for each partner becomes a rather small data transform that "plugs in" to this using its own specification. Contrary to what some higher ups would like it to look like, it really is not as simple as defining a SQL view per partner. Each partner has their own filtering requirements that would be difficult to accomplish using just the database. It would be possible by writing PL/SQL packages (this is Oracle for the most part) but so much of what they want is much much simpler in Perl. That filtering becomes part of the transform. It's written in such a way that it's concise and readable and very easy to see what each partner gets. This will make no sense out of the context of the system, but here's an example of a fairly simple transform: `my @transform = ( DBQuery => { DB_NAME => $self->{DB_NAME}, SQL => $sql, }, => RowAdd => { VALUES => $fixed_cols } => SiteURL => { ID_COL => 'ENTRY', URL_COL => 'URL', SOURCE_COL => 'X_ID', } => XML => { XML_TEMPLATE => $html_template, STRICT_XML => 0, NEWLINES => 0, DATA_MODE => 1, DATA_INDENT => 3, FILE_NAME => $output->{PATH}, } );` [download] Basically we abstract it out enough so we can use Perl itself as a fairly reasonable 4GL-ish like thing -- specifications are pretty much free of all those scary Perl things like regular expressions. 8-) Once you've canned one of these up there's a common tool (both a common Perl package hierarchy and a common command line tool using those packages) that's used to "run" the tranforms. The actual data then ends up in its desired format in the directory structure. Another piece of the tool set (integrated into the package and command line tools) takes care of the desired delivery (ftp, http, ssh, sftp, etc.) again doing the bulk of the work with available Perl CPAN packages that we fit into a well-defined set of interfaces. What I find coolest about this is that, unlike database oriented transform tools, I can transform virtually anything if I can fit it into the model. With CPAN, I can almost always find code that does the bulk of what I want. My job then becomes fitting it into the model. As you might expect, the model usually expands and becomes more robust as I accomodate more and more things. I've been building it for over 3 years. 8-) But back to the "doing anything": I've plugged in transforms that take data in then use LWP to poke that data into web site forms for example. Try doing that with traditional database transform tools. Now all that being said, it's pretty significatnt to attempt to build in a short time. This was an evolutionary process. I consider what I've described here as the third generation of that evolution. Generation #1 was just quick and dirty scripting -- get it out. That's where I cut my teeth on things like DBI. So even though I threw that all away it was a very valuable tool for learning and helping me understand the problem domain. Generation #2 is still being used in transition. That was a much earlier model of what's here and much much simpler. Basically it was set up to address needs most or all of the early parnters had. What they wanted didn't differ much at all in a lot of areas. What I did there that led to what I have now is this: I defined a base Perl class with methods for the basic things we did for each partner: create exports, send exports, get information about partners and their exports. I framed that out so that the base class did 80-90% of what each partner needed, calling out to the methods it didn't have for the specifics; e.g., "give me the SQL", "filter a row", etc. From the base class, I derived 3-4 main sub-classes for each type of output we needed at that time: Delimited, CSV, XML, etc. For each partner, I then created a package that chose the output type base class to use as its parent. That gave it canned output of a given type and the bulk of the processing. That partner package then filled in the details. That second generation wasn't as flexible. But I ended up building the third generation tool set because the second one made it so easy to add a new partner. I'd add a partner in an hour or two and people started noticing. They'd say, "Well if you do that so fast, how about plugging this data export/import job in for us." Those other requests weren't nearly as consistent as the original ones. So I started a new model on the side that became what I have now. Long winded answer (and I haven't even had coffee yet) but maybe there's something you can take out of it and use for your case.	[reply] [d/l]
Re: database design question by Cabrion (Friar) on Feb 07, 2003 at 11:18 UTC
From a pure RDBMS standpoint you are asking for views and a table for some "meta data" on your clients. Views are just a stored SQL statement such as "select x.a1, y.a2 from x, y where x.id = y.id" that you can name something like "cust1_feed4". As column needs change, just change the view for the client. Assuming you have multiple feeds per client, the meta data on each client would be stored in a "client" table that has a structure like this. clientID, feedID, viewName,FormatFlag, ... You should hire a DBA or really dig into some basic relational principles. Too often, developers with little or no experience in relational data base management architect themselves into a corner by not understanding key concepts like rationalization. A good DBA would have a structure outlined for you in half a day or less complete with optimized indexes.	[reply]
Re: database design question by abell (Chaplain) on Feb 07, 2003 at 12:03 UTC
Unless you neeed to perform queries on the different (partner-specific) parts of the submissions, there is no need to use different columns. I would imagine a single table containing fields submission_ID, partner_ID and submission_data, plus possibly other fields with a global (not partner-related) meaning. The data would be a blob in a structured format. You could optimize for parsing speed by choosing your own format or go with XML. As to the modules, I would associate to each partner one or more modules, perhaps classes inheriting from some common superclasses. When retrieving and managing the submission, the partner-specific module would take care of converting the blob to an in-memory representation. Something analogous would happen when creating, serializing and storing the submission. The modules would be retrieved on-the-fly (with some caching mechanism in place) by either perl's standard file-based "require" system or by reading them from a DB. As to the choice of putting perl code in a DB, it is not wrong and in many cases it's a very advisable choice. Consider that the everything engine powering PM does it. It lets you modularize your application and make it extensible in a clean way and makes online updates doable without the risks involved in messing with the application filesystem. Antonio The stupider the astronaut, the easier it is to win the trip to Vega - A. Tucket	[reply]
Re: Re: database design question by waswas-fng (Curate) on Feb 07, 2003 at 16:16 UTC
Except for some of the recent posts about that very subject that point out some of the issues with that style. Putting code in DB and evaling it can make upkeep and dramatic core changes tough to say the least. It also can make passing off the code to another group of developers a harder task. -Waswas	[reply]
Re: database design question by pg (Canon) on Feb 07, 2003 at 15:54 UTC
I see similar situations quite often. For example I have a database for our supply chain system. We need to store the cost structure of each item, but each item would have a different cost structure, some might have this cost type, but not that, and some would have that but not this. In this case, it would be a bad idea to store the cost structure in one table, having the items as rows, and each cost component as one column. That's bad, and it is bad because it breaks the rules of normalization. What I did is to have three tables, one store all the validate items, and one store all the possible cost components. Now I use a third one to store what cost components are involved for each item. This third table only needs three columns (in my case): One column for item numbers, with reference to the valid item table; One column for cost components, with reference to the valid cost components table; (each item would have multiple rows to store its cost structure, but item + cost components is a unique key.) One column for the dollar value of this particular cost component for this particular item. You have exactly the same pattern of requirement. Your third-parties are my items, and your web page components are my cost components. With this design your tables are fully normalized. This would be the best design on the database side.	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks