Most distinguished Monks,
A recent node (thanks, jeffa) offered a very interesting theoretical (and practical) view into database normalization.
While I couldn't agree more on the benefits of normalization, I was wondering if there was a different way of tackling the problem from the practical side.
Of course, normalization is never going to be "easy." But it could be easier than building a specialized solution for each table.
Due to previous experience with these tasks (which I used to solve in C - don't shoot!) I felt that a more general solution is possible, and fiddling around with Perl I came up with a module that can reduce our normalization efforts to something like the following:
The more adventurous could also try a one-liner (Normalization Golf?):
What happens with this script?
Having this initial data (database: music, table: MP3),
Here are the instructions produced by the above lines of perl:
Don't rush to the CPAN ;-). The Normalizer module is not there (yet) also because I don't know if I should ask for a standalone namespace or under DBIx:: (any piece of advice here will be more than welcome). It is not a short piece of code either. It is 968 lines (65% of which are documentation) and I didn't feel like posting all of it in this node.
But you are welcome to have a look at it from the following addresses:
Normalizer.pm (highlighted syntax - 128 KB)
Normalizer.pm (plain script - 38 KB)
Normalizer.pod (documentation - 30 KB)
Normalizer-0.05.tgz (complete package - 33 KB)
Now in the CPAN as DBSchema-Normalizer
The black magic behind this code is more SQL than Perl. However, Perl makes it easier to collect the necessary pieces of information from the database engine and create the SQL statements. More important, Perl makes a generalized solution feasible.
A complete explanation of the algorithm is in the module documentation. The basic concept is to let the database engine work the heavy load, while Perl is directing the operations without wasting any valuable resources.
This module deals only with MySQL databases, but the principle should be valid for any RDBMS. If you want to try it risk-free, it is possible to run the script in "simulation mode," producing the SQL without executing it.
I hope this is going to be helpful, and I will be glad to receive your comments.
_ _ _ _ (_|| | |(_|>< _|