<?xml version="1.0" encoding="windows-1252"?>
<node id="616725" title="Fishing for constraint =&gt; SQL designs" created="2007-05-22 06:44:06" updated="2007-05-22 02:44:06">
<type id="115">
perlquestion</type>
<author id="29600">
bsb</author>
<data>
<field name="doctext">
&lt;p&gt;
I'll outline the situation, what I'd like to achieve, how I think
it could be attacked.  I'm writing in the hope that this will help
me find code out there that I can use, or at least to learn from people
who have experience with related problems.
&lt;/p&gt;

&lt;h3&gt;Existing Framework:&lt;/h3&gt;

&lt;p&gt;
We provide a validation tool for a few simple fixed width data formats,
it checks basic structure, domains and inter-record relationships.
There's also a HTTP::Server::Simple utility to view the validation
results, from a high-level or line by line.  These are bundled
together with PAR and deployed on Windows (although I develop on Linux).
The database used is SQLite with Class::DBI::Loader generated classes
and CDBI::Sweet for joining (yes, these were cutting edge at the time
and no I don't want to change unless it's going to buy me a lot of time).
&lt;/p&gt;&lt;p&gt;

The formats are defined in custom YAML data dictionaries that specify
the names, labels, types, lengths, domains, record layouts and
relationships.  The rest of the utilities configure themselves from
this metadata.
&lt;/p&gt;&lt;p&gt;

&lt;/p&gt;&lt;p&gt;
&lt;a href="http://bereft.net/MHE.png"&gt;See 
for an example ERD-thingamibob&lt;/a&gt;
&lt;/p&gt;

&lt;h3&gt;Extension Goals:&lt;/h3&gt;

&lt;p&gt;
Another stage of checking is to be added, which will be run after
the existing checks.  It will check logical consistency within and
between records (not correcting just marking errors for display),
checking against pre-supplied historical information, checking
aggregates, counts and calculations against thresholds that are
either predetermined or are themselves based on the file's data.
&lt;/p&gt;&lt;p&gt;

These rules will have be of varying levels of complexity and will
probably have reflect the ad-hoc knowledge of the current distributed
grey matter implementations.  I envision the automatic validation
nibbling its way up the complexity chain as more rules can be
coded within it.
&lt;/p&gt;&lt;p&gt;

Some example rules:
&lt;/p&gt;&lt;p&gt;

  Iff there Full-time Equivalent data has been provided for Xs
  there should also be Salary information for Xs
&lt;/p&gt;&lt;p&gt;

  The average Salary (derived from other data) should be within
  a certain range.
&lt;/p&gt;&lt;p&gt;

There's plenty more.
&lt;/p&gt;&lt;p&gt;


&lt;h3&gt;Ideal:&lt;/h3&gt;

&lt;p&gt;
I'd like a set of declarative rules, that have a machine understandable
format which can be translated to SQL to checks that may find and log
invalid records, but may also simply log the violation (eg. if the
aggregate information is unusual).  The same declarative format
(maybe something like an AST) could be used to document the rules
as text, images or useful error message (templated, filling in line
numbers, field names, values and information links).
&lt;/p&gt; &lt;p&gt;

The rules should be able to calculate and name derived values that can
be shared (and hopefully not recalculated, note that the processing
stages are one-way and one at a time).
&lt;/p&gt; &lt;p&gt;

Better yet, the mini-language declaring the constraints would be
intelligible by the current domain experts (non-programmers) and
writable by them.
&lt;/p&gt; &lt;p&gt;

In my dreams...
&lt;/p&gt; &lt;p&gt;


&lt;h3&gt;Semi-Realistic Goal:&lt;/h3&gt;

&lt;p&gt;
A framework that with rules that comprise of a name, documentation,
a priority, maybe an error template and the constraint.  The rules
will probably be written in the highest level of abstraction that
can express them and that exists.  Until the high levels are written
I'll be condemned to SQL, perhaps some rules will remain as SQL
snippets due to there complexity, or to avoid performance issues.
&lt;/p&gt; &lt;p&gt;

The abstraction hierarchy may look like:&lt;br&gt;
&lt;pre&gt;
 SQL =&gt; SQL snippets =&gt; AST =&gt; Domain Specific Language
&lt;/pre&gt;

&lt;p&gt;
For the moment, this is all wishful thinking.
&lt;/p&gt; &lt;p&gt;

Over the next month I'll work on the lowest levels but would like
to have a sketch of the higher-levels to guide me.
&lt;/p&gt;


&lt;h3&gt;Ideas:&lt;/h3&gt;

&lt;p&gt;
I'm looking at Tangram::Expr, QDRDBMS, SQL::Routine and Rosetta for hints
There seems to be some alignment between what I'm after and the SQL AST field,
but the alignment is probably not perfect.
&lt;/p&gt; &lt;p&gt;
I see a SQL AST as a good target for my to-be-invented-or-stolen DSL.
The SQL AST would effectively define formulas for derived values, criteria
for flagging violations, and constraints which need to be checked (different
from db constraints).
&lt;/p&gt; &lt;p&gt;

The rules -&gt; ast -&gt; sql conversion could potentially be done at build time,
therefore even Pugs and other languages can be considered.
&lt;/p&gt; &lt;p&gt;

Other ideas still floating around are Prolog, SQL::Abstract, S-expressions,
DSL-&gt;SQL done using Perl or non-Perl tool chains, an Object::Declare-like
Perl 5 DSL hack, Lua as a SQLite procedural language, Meta-Lua + Lua as
the DSL, human-kind lives in peace and harmony forever, etc, etc, etc.
&lt;/p&gt; &lt;p&gt;


My current plans is write a few examples of varying complexity in SQL,
which I'll probably have to do any way, and see what can be refactored.
I'll post that when done to solicit further guidance.
&lt;/p&gt; &lt;p&gt;


&lt;h3&gt;Other Considerations:&lt;/h3&gt;

&lt;p&gt;
The Windows deployment with PAR can be tricky to wrangle so I'm wary of big
frameworks, platform dependency and C/XS, especially since I use
pseudo-forking.  I still have flashbacks...
&lt;/p&gt;

&lt;p&gt;
Also, I should make clear that the database isn't being updated or
more data inserted, so caching of intermediates is ok.
&lt;/p&gt;

&lt;p&gt;
I hope that
someone will be able to point me to useful code or warn me when
there be dragons.  Advice welcome, 
&lt;/p&gt;

Brad</field>
</data>
</node>
