Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

database design question

by nop (Hermit)
on Feb 07, 2003 at 10:54 UTC ( #233430=perlquestion: print w/replies, xml ) Need Help??
nop has asked for the wisdom of the Perl Monks concerning the following question:

Hi, gang.
I'm working on a database app that generates web content for third-party sites. The same general content is often pushed to multiple (between 3 and 10) third-party sites. These 3rd party partner sites accept content in various ways: some through flat text feeds, some through XML, etc. For the same general submission, each partner requires (slightly) different data in a slightly different format (some take two fields, some three or four; some take wider lines, some shorter; etc). On my side, I store the generic submission concept in our database. What I also want to store is, for each specific submission, the actual submission sent to each partner. There's no hope I could get the partners to agree on a standard submission format; I'm stuck with catering to each of their particularities.
Here are the options I've considered:
  • Have a table 'submitted', and try to cram the various partners in one table using a variety of dedicated and semi-dedicated columns. This seems bad as the format varies by partner, so the modules using this table will have to hardcode logic like 'For partner1, don't use col7', etc. Yuck.
  • Have a table for each partner. While this allows me to keep the right structure for each partner, it makes the joins a mess. Yuck.
  • Store all the submissions in one table in using chunks of XML, letting the XML handle the partner-specific structure. Possible.
  • Store all the submissions in one table in using chunks of perl code, letting the perl code generate and describe the partner-specific structure. Yuck. Storing perl in a database seems wrong.

Any suggestions from the gang most highly appreciated.
And if XML is the way to go, any comments on smart vs. dumb ways to encode via XML appreciated.



Replies are listed 'Best First'.
Re: database design question
by steves (Curate) on Feb 07, 2003 at 12:29 UTC

    I manage this exact scenario for 20-30 partners. In my case the source data is essentially all the same -- they want our content sliced and diced in a way they've defined. They define what they want (which tables/columns of ours to query), what to call it (their tags), and what format they want it in (XML, delimited, CSV, fixed width, etc.). I built a tool set that does data transforms using an input/filter/output model. The inputs and outputs are primarily wrappers around existing CPAN modules: e.g., database tables using DBI; Excel using Spreadsheet::ParseExcel and Spreadsheet::WriteExcel; XML using a templating system we wrote around several existing Perl XML systems; etc. -- you get the idea.

    The core code for each partner becomes a rather small data transform that "plugs in" to this using its own specification. Contrary to what some higher ups would like it to look like, it really is not as simple as defining a SQL view per partner. Each partner has their own filtering requirements that would be difficult to accomplish using just the database. It would be possible by writing PL/SQL packages (this is Oracle for the most part) but so much of what they want is much much simpler in Perl. That filtering becomes part of the transform. It's written in such a way that it's concise and readable and very easy to see what each partner gets. This will make no sense out of the context of the system, but here's an example of a fairly simple transform:
    my @transform = ( DBQuery => { DB_NAME => $self->{DB_NAME}, SQL => $sql, }, => RowAdd => { VALUES => $fixed_cols } => SiteURL => { ID_COL => 'ENTRY', URL_COL => 'URL', SOURCE_COL => 'X_ID', } => XML => { XML_TEMPLATE => $html_template, STRICT_XML => 0, NEWLINES => 0, DATA_MODE => 1, DATA_INDENT => 3, FILE_NAME => $output->{PATH}, } );
    Basically we abstract it out enough so we can use Perl itself as a fairly reasonable 4GL-ish like thing -- specifications are pretty much free of all those scary Perl things like regular expressions. 8-)

    Once you've canned one of these up there's a common tool (both a common Perl package hierarchy and a common command line tool using those packages) that's used to "run" the tranforms. The actual data then ends up in its desired format in the directory structure. Another piece of the tool set (integrated into the package and command line tools) takes care of the desired delivery (ftp, http, ssh, sftp, etc.) again doing the bulk of the work with available Perl CPAN packages that we fit into a well-defined set of interfaces.

    What I find coolest about this is that, unlike database oriented transform tools, I can transform virtually anything if I can fit it into the model. With CPAN, I can almost always find code that does the bulk of what I want. My job then becomes fitting it into the model. As you might expect, the model usually expands and becomes more robust as I accomodate more and more things. I've been building it for over 3 years. 8-) But back to the "doing anything": I've plugged in transforms that take data in then use LWP to poke that data into web site forms for example. Try doing that with traditional database transform tools.

    Now all that being said, it's pretty significatnt to attempt to build in a short time. This was an evolutionary process. I consider what I've described here as the third generation of that evolution. Generation #1 was just quick and dirty scripting -- get it out. That's where I cut my teeth on things like DBI. So even though I threw that all away it was a very valuable tool for learning and helping me understand the problem domain. Generation #2 is still being used in transition. That was a much earlier model of what's here and much much simpler. Basically it was set up to address needs most or all of the early parnters had. What they wanted didn't differ much at all in a lot of areas. What I did there that led to what I have now is this:
    • I defined a base Perl class with methods for the basic things we did for each partner: create exports, send exports, get information about partners and their exports.
    • I framed that out so that the base class did 80-90% of what each partner needed, calling out to the methods it didn't have for the specifics; e.g., "give me the SQL", "filter a row", etc.
    • From the base class, I derived 3-4 main sub-classes for each type of output we needed at that time: Delimited, CSV, XML, etc.
    • For each partner, I then created a package that chose the output type base class to use as its parent. That gave it canned output of a given type and the bulk of the processing. That partner package then filled in the details.
    That second generation wasn't as flexible. But I ended up building the third generation tool set because the second one made it so easy to add a new partner. I'd add a partner in an hour or two and people started noticing. They'd say, "Well if you do that so fast, how about plugging this data export/import job in for us." Those other requests weren't nearly as consistent as the original ones. So I started a new model on the side that became what I have now.

    Long winded answer (and I haven't even had coffee yet) but maybe there's something you can take out of it and use for your case.

Re: database design question
by Cabrion (Friar) on Feb 07, 2003 at 11:18 UTC
    From a pure RDBMS standpoint you are asking for views and a table for some "meta data" on your clients. Views are just a stored SQL statement such as "select x.a1, y.a2 from x, y where =" that you can name something like "cust1_feed4". As column needs change, just change the view for the client.

    Assuming you have multiple feeds per client, the meta data on each client would be stored in a "client" table that has a structure like this.

    clientID, feedID, viewName,FormatFlag, ...  
    You should hire a DBA or really dig into some basic relational principles. Too often, developers with little or no experience in relational data base management architect themselves into a corner by not understanding key concepts like rationalization.

    A good DBA would have a structure outlined for you in half a day or less complete with optimized indexes.

Re: database design question
by abell (Chaplain) on Feb 07, 2003 at 12:03 UTC

    Unless you neeed to perform queries on the different (partner-specific) parts of the submissions, there is no need to use different columns. I would imagine a single table containing fields submission_ID, partner_ID and submission_data, plus possibly other fields with a global (not partner-related) meaning. The data would be a blob in a structured format. You could optimize for parsing speed by choosing your own format or go with XML.

    As to the modules, I would associate to each partner one or more modules, perhaps classes inheriting from some common superclasses. When retrieving and managing the submission, the partner-specific module would take care of converting the blob to an in-memory representation. Something analogous would happen when creating, serializing and storing the submission. The modules would be retrieved on-the-fly (with some caching mechanism in place) by either perl's standard file-based "require" system or by reading them from a DB.

    As to the choice of putting perl code in a DB, it is not wrong and in many cases it's a very advisable choice. Consider that the everything engine powering PM does it. It lets you modularize your application and make it extensible in a clean way and makes online updates doable without the risks involved in messing with the application filesystem.


    The stupider the astronaut, the easier it is to win the trip to Vega - A. Tucket
      Except for some of the recent posts about that very subject that point out some of the issues with that style. Putting code in DB and evaling it can make upkeep and dramatic core changes tough to say the least. It also can make passing off the code to another group of developers a harder task.

Re: database design question
by pg (Canon) on Feb 07, 2003 at 15:54 UTC
    I see similar situations quite often. For example I have a database for our supply chain system. We need to store the cost structure of each item, but each item would have a different cost structure, some might have this cost type, but not that, and some would have that but not this.

    In this case, it would be a bad idea to store the cost structure in one table, having the items as rows, and each cost component as one column. That's bad, and it is bad because it breaks the rules of normalization.

    What I did is to have three tables, one store all the validate items, and one store all the possible cost components. Now I use a third one to store what cost components are involved for each item. This third table only needs three columns (in my case):
    1. One column for item numbers, with reference to the valid item table;
    2. One column for cost components, with reference to the valid cost components table; (each item would have multiple rows to store its cost structure, but item + cost components is a unique key.)
    3. One column for the dollar value of this particular cost component for this particular item.
    You have exactly the same pattern of requirement. Your third-parties are my items, and your web page components are my cost components. With this design your tables are fully normalized. This would be the best design on the database side.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://233430]
Approved by dorko
Front-paged by diotalevi
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2018-02-21 06:14 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (275 votes). Check out past polls.