perlmeditation
gmax
<p>A few months ago, [dws], against the popular opinion promoting SQL
statements builders, [id://288534|stated] that hardcoded SQL is better and somethimes quicker to obtain.
I heartily agree on his position, since I usually do the same thing,
never trusting a tool to create SQL code for production purposes.
However, there is an exception to this rule, and I rely on a special
tool to produce complex queries for cross-tabulation. Since I am the
author of such a tool, I am less suspicious towards its evil
side effects on the rest of my applications.</p>
<p>Until a few months ago,
the tool we are talking about was just a ugly perl script with a few
dozen variables, which I used to create these exceedingly long queries
that I then pasted to the production code. That was not a satisfactory
solution, though, and when the requirements for truly dynamic queries
became unavoidable, I created [CPAN://DBIx::SQLCrosstab], allowing the
execution of dynamically created crosstab queries.</p>
<p>For those of you not familiar with the concept, let me remind that
crosstab queries are quite peculiar, often involving hundreds of SQL
lines, unlikely to be handled by simple DBI wrappers. A recent <a
href="http://www.onlamp.com/pub/a/onlamp/2003/12/04/crosstabs.html">
article</a> explains in full detail the database theory behind the
creation of crosstab queries and gives some tips on how to use
[http://search.cpan.org/perldoc?DBIx::SQLCrosstab|DBIx::SQLCrosstab]
to its best.</p>
<p>What the article doesn't cover is most of the "under-the-hood" stuff,
the juicy details of how the engine works and how to expand it. Since
OnLamp editors wanted to keep the article focused on
database matters, I stashed away all the Perl relevant issues for
PerlMonks, and here it comes ...</p>
<readmore>
<h2>DBIx::SQLCrosstab interiors</h2>
<p>One of the most intriguing aspects of DBIx::SQLCrosstab is the SQL
builder engine, which, in addition to producing valid SQL code, must be
also able to cope with a long list of requirements.</p>
<h3>Requirements complexity</h3>
<p>To create a realistic multi-level crosstab query, you must:</p>
<ul>
<li>get a list of values for every level of headers; </li>
<li>combine the values of each level to create a condition</li>
<li>create a sensible name for the resulting calculated column</li>
<li>add columns (with appropriate calculations) for column sub-totals</li>
<li>ensure that combinations of values from various levels are not impossible
(e.g.: allow UK/London, Germany/Berlin, and Italy/Rome, but avoid
Italy/London, UK/Berlin, Germany/Rome)</li>
<li>For each level, allow either instructions to get the values from the database or a
pre-compiled list of values</li>
<li>assemble the query lines together, taking into account mandatory clauses
(FROM and GROUP BY) and optional ones (WHERE, HAVING, ORDER BY)</li>
<li>allow handlers for row sub-totals and multi-operator queries.</li>
</ul>
<h3>a simple, brute-force engine</h3>
<p>To appreciate the complexity of the requirements, let's build a simpler
engine that can only deal with a few rules, creating a query from simple
requirements. The following example will create a query from a list of
values and the corresponding field names, and some information on the
operation to perform.</p>
<p>(You may want to look at the
[http://search.cpan.org/src/GMAX/DBIx-SQLCrosstab-1.15/examples/crosstab_sample.html|database schema]
. The source data is available in
[http://www.onlamp.com/pub/a/onlamp/2003/12/04/crosstabs.html|the article])</p>
<code>
#!/usr/bin/perl -w
use strict;
my @values = (
[ qw( pers dev sales ) ], # dept
[ qw( consultant contractor employee ) ], # category
[ qw( m f ) ] # gender
);
my @condition_names = ('dept', 'category', 'gender');
my $operation = 'SUM';
my $field_to_operate = 'salary';
# permute function written by Randal L. Schwartz,
# aka merlyn
# http://www.perlmonks.org/index.pl?node_id=24270
sub permute {
my $last = pop @_;
unless (@_) {
return map [$_], @$last;
}
return map { my $left = $_; map [@$left, $_], @$last } permute(@_);
}
my @permuted = permute(@values);
#now @permuted = (
# ['pers', 'consultant', 'm'],
# ['pers', 'consultant', 'f'],
# ['pers', 'contractor', 'm'],
# ['pers', 'contractor', 'f'],
# ['pers', 'employee', 'm'],
# ['pers', 'employee', 'f'],
# ['dev', 'consultant', 'm'],
# ['dev', 'consultant', 'f'],
# ['dev', 'contractor', 'm'],
# ['dev', 'contractor', 'f'],
# ['dev', 'employee', 'm'],
# ['dev', 'employee', 'f'],
# ['sales', 'consultant', 'm'],
# ['sales', 'consultant', 'f'],
# ['sales', 'contractor', 'm'],
# ['sales', 'contractor', 'f'],
# ['sales', 'employee', 'm'],
# ['sales', 'employee', 'f'],
#);
my $query = qq{SELECT country, loc\n};
for my $line(@permuted) {
#my @flds = @condition_names;
my $ndx=0;
$query .=
","
. $operation
. "(CASE WHEN "
. join(" AND ", map { $condition_names[$ndx++] . " = '" . $_ . "'" } @$line)
. " THEN "
. $field_to_operate
. " ELSE NULL END ) AS "
. "'"
. join ("#", @$line) . "'\n"
}
$query .= ",$operation($field_to_operate) AS total\n"
. qq{FROM
person
INNER JOIN depts ON (person.dept_id=depts.dept_id)
INNER JOIN categories ON (person.cat_id=categories.cat_id)
INNER JOIN locs on(locs.loc_id=person.loc_id)
INNER JOIN countries ON(locs.country_id=countries.country_id)\n}
. "GROUP BY country, loc\n";
print $query;
</code>
<p>This script shows only the SQL builder part. The preparatory part would require
a few additional queries, one for each level of header, to get the values from
the appropriate database tables. Something along the lines of:</p>
<code>
my $depts = $dbh->selectcol_arrayref(qq{SELECT DISTINCT dept FROM depts})
or die "can't get dept values\n";
my $categories = $dbh->selectcol_arrayref(qq{SELECT DISTINCT category FROM categories})
or die "can't get category values\n";
my $genders = $dbh->selectcol_arrayref(qq{SELECT DISTINCT gender FROM person})
or die "can't get gender values\n";
my @values = ( $depts, $categories, $genders );
</code>
<p>The heart of the engine is in those two [join]s, the first of which creates a
composite condition. Each item is a combination "field name + value". Several
items are joined by a "AND" operator. The resulting operation is included inside
a "CASE" function (SQL ANSI 92), which passes to the "SUM" function either the
relevant field or a NULL.</p>
<p>The second [join] creates the column name for the result
set. What is missing in this crude version is a check on the applicability
of the character used here as a field separator. Also, some database engines have
limitations on the column name, which has a maximum length and some restriction
on the characters to use in it. The real module has a default mechanism to use
fake names (fld001, fld002, and so on) and a support hash to keep track of the real
names. This trick makes the query more acceptable to fragile DBMS parsers.</p>
<p>The resulting query follows. Don't be afraid. It isn't the shortest you can get,
but it's far from being the longest one. Crosstab queries with the tiny sample
database that ships with the module (9 records) can be up to 600 lines long.</p>
<code>
$query = qq{
SELECT country, loc
,SUM(CASE WHEN dept = 'pers' AND category = 'consultant' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'pers#consultant#m'
,SUM(CASE WHEN dept = 'pers' AND category = 'consultant' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'pers#consultant#f'
,SUM(CASE WHEN dept = 'pers' AND category = 'contractor' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'pers#contractor#m'
,SUM(CASE WHEN dept = 'pers' AND category = 'contractor' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'pers#contractor#f'
,SUM(CASE WHEN dept = 'pers' AND category = 'employee' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'pers#employee#m'
,SUM(CASE WHEN dept = 'pers' AND category = 'employee' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'pers#employee#f'
,SUM(CASE WHEN dept = 'dev' AND category = 'consultant' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'dev#consultant#m'
,SUM(CASE WHEN dept = 'dev' AND category = 'consultant' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'dev#consultant#f'
,SUM(CASE WHEN dept = 'dev' AND category = 'contractor' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'dev#contractor#m'
,SUM(CASE WHEN dept = 'dev' AND category = 'contractor' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'dev#contractor#f'
,SUM(CASE WHEN dept = 'dev' AND category = 'employee' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'dev#employee#m'
,SUM(CASE WHEN dept = 'dev' AND category = 'employee' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'dev#employee#f'
,SUM(CASE WHEN dept = 'sales' AND category = 'consultant' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'sales#consultant#m'
,SUM(CASE WHEN dept = 'sales' AND category = 'consultant' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'sales#consultant#f'
,SUM(CASE WHEN dept = 'sales' AND category = 'contractor' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'sales#contractor#m'
,SUM(CASE WHEN dept = 'sales' AND category = 'contractor' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'sales#contractor#f'
,SUM(CASE WHEN dept = 'sales' AND category = 'employee' AND gender = 'm'
THEN salary ELSE NULL END ) AS 'sales#employee#m'
,SUM(CASE WHEN dept = 'sales' AND category = 'employee' AND gender = 'f'
THEN salary ELSE NULL END ) AS 'sales#employee#f'
,SUM(salary) AS total
FROM
person
INNER JOIN depts ON (person.dept_id=depts.dept_id)
INNER JOIN categories ON (person.cat_id=categories.cat_id)
INNER JOIN locs on(locs.loc_id=person.loc_id)
INNER JOIN countries ON(locs.country_id=countries.country_id)
GROUP BY country, loc
}
</code>
<h3>Trees and combinations</h3>
<p>The simple example in the previous section can solve many cases. But if
we expand the requirements as to include column sub-totals, that simple
paradigm can't carry out the task successfully. We need to either modify
the engine to take sub totals into account or modify the permutation
algorithm to produce the sub totals combinations in the same data structure
used for the normal processing.</p>
<p>I chose the second solution, and I implemented a second permutation
function that produces a modified <code>@permuted</code> array.</p>
<code>
@permuted = (
['pers', 'consultant', 'm'],
['pers', 'consultant', 'f'],
['pers', 'consultant'],
['pers', 'contractor', 'm'],
['pers', 'contractor', 'f'],
['pers', 'contractor'],
['pers', 'employee', 'm'],
['pers', 'employee', 'f'],
['pers', 'employee'],
['pers'],
['dev', 'consultant', 'm'],
['dev', 'consultant', 'f'],
['dev', 'consultant'],
['dev', 'contractor', 'm'],
['dev', 'contractor', 'f'],
['dev', 'contractor'],
['dev', 'employee', 'm'],
['dev', 'employee', 'f'],
['dev', 'employee'],
['dev'],
['sales', 'consultant', 'm'],
['sales', 'consultant', 'f'],
['sales', 'consultant'],
['sales', 'contractor', 'm'],
['sales', 'contractor', 'f'],
['sales', 'contractor'],
['sales', 'employee', 'm'],
['sales', 'employee', 'f'],
['sales', 'employee'],
['sales'],
);
</code>
<p>There were several choices to create this modified array. Although I got
some [id://287900|clever tips] on how to make this function, in the end I
decided to implement it with a [CPAN://Tree::DAG_Node], mostly because I
would need this module to process the cross tabulation output.</p>
<p>When passed to the same engine, this data structure produces a query with a sub total
for each level of headers.</p>
<h3>A (brief) digression about database algorithms</h3>
<p>A more difficult case is when a level depends on the previous one, such as
"countries / towns" or "University faculties / subjects". If I were processing
such columns with the same algorithm used so far, I would end up with things like
"Mathematics / Greek Philosophy", "Languages / calculus", or "History / operating
systems."</p>
<p>The difficult thing about database programming is that an algorithm can involve
steps in the client interspersed with steps in the server. </p>
<p>If you are used to designing algorithms on the client side, dealing with
this mixed nature can be frustrating, since you could try to solve on the
client side what you should do on the server side. On the other hand, if you are
familiar with database solutions, you could be tempted to do on the server side
simple operations that should be treated in the client.</p>
<p>When you reach the right balance, you can create sensible algorithms. Trouble is,
sometimes you become too much confident in what you've done, especially if you
have just created a good, clever-looking, and efficient paradigm. When this happens,
it is harder for you to find a fault in your work, and when you find it, your
mind is clouded by your previous success, thus making it more difficult to find
a solution for your latest problem.</p>
<p>In my case, when I found out that my algorithm did not cover this particular
aspect, I spent some time fantasizing about very complicated variations of what I
had already done, with intermediate data structures and acrobatic manipulations
of the latter. Then I decided that I was doing something wrong, and I left the
matter rest for a while. Finally, the clear solution came to me, i.e. whenever
there is a case of dependency among levels, then I let the database engine find
the legal permutations. Just querying for the DISTINCT values of all the involved
columns will return all the valid permutations. So I added this variation as an
option and now the module seems to cover every angle of the matter.</p>
<h2>Expanding the engine</h2>
<p>Creating the query was only half of the problem. Returning the data in a
format suitable for human consumption is a strictly related issue, and I designed
the module to facilitate this important part.</p>
<p>The format issue is not addressed directly by DBIx::SQLCrosstab. To keep the
contents separated from their appearance, I created a descendant of the first module,
DBIx::SQLCrosstab::Format, which can do everything its parent can do, and can
also create some good-looking reports, like the one shown here.</p>
<table border="1" cellspacing="0" cellpadding="2">
<tr>
<th rowspan="3"><b>Area</b></th>
<th rowspan="3"><b>country</b></th>
<th rowspan="3"><b>location</b></th>
<th colspan="6"><b>pers</b></th>
<th colspan="7"><b>sales</b></th>
<th colspan="5"><b>dev</b></th>
<th rowspan="3"><b>T</b></th>
</tr>
<tr>
<th colspan="3"><b>empl</b></th>
<th colspan="2"><b>contr</b></th>
<th rowspan="2"><b>T</b></th>
<th colspan="2"><b>empl</b></th>
<th colspan="2"><b>contr</b></th>
<th colspan="2"><b>cons</b></th>
<th rowspan="2"><b>T</b></th>
<th colspan="2"><b>empl</b></th>
<th colspan="2"><b>cons</b></th>
<th rowspan="2"><b>T</b></th>
</tr>
<tr>
<th ><b>f</b></th>
<th ><b>m</b></th>
<th ><b>T</b></th>
<th ><b>m</b></th>
<th ><b>T</b></th>
<th ><b>m</b></th>
<th ><b>T</b></th>
<th ><b>m</b></th>
<th ><b>T</b></th>
<th ><b>f</b></th>
<th ><b>T</b></th>
<th ><b>m</b></th>
<th ><b>T</b></th>
<th ><b>f</b></th>
<th ><b>T</b></th>
</tr>
<tr>
<td rowspan="7">N</td> <td rowspan="4">Germany</td> <td rowspan="1">Berlin</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">Bonn</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td>
</tr>
<tr>
<td rowspan="1">Munich</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">2</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">2</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">2</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">5</td>
</tr>
<tr>
<td rowspan="2">UK</td> <td rowspan="1">London</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td> <td rowspan="1">total</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">2</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">3</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">3</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">7</td>
</tr>
<tr>
<td rowspan="3">S</td> <td rowspan="2">Italy</td> <td rowspan="1">Rome</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td> <td rowspan="1">total</td>
<td align="right">0</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">1</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">0</td> <td align="right">0</td> <td align="right">0</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">2</td>
</tr>
<tr>
<td rowspan="1">total</td> <td rowspan="1">total</td> <td rowspan="1">total</td>
<td align="right">1</td> <td align="right">2</td> <td align="right">3</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">4</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">3</td> <td align="right">1</td> <td align="right">1</td>
<td align="right">1</td> <td align="right">1</td> <td align="right">2</td>
<td align="right">9</td>
</tr>
</table>
<p>This table was created by the <i>as_html</i> method, with the help of a few
data structures that I designed to deal with general purpose tree-like reports,
like this peculiar HTML table or a rich XML document.</p>
<h3>Data structures for format depicting</h3>
<p>The above table was created using these two data structures, each one describing
a tree-like header, one at the top of the table and one at its left.</p>
<p>The first one, $header_formats, is a description of the headers at the top. Each
element in this array reference describes one line of headers as an array of hashes.
For example, the first line says that 'Area' will occupy one column, but it will span
down three rows. Armed with this information, it is very easy to build a HTML table
header.</p>
<code>
$header_formats = [
[
{ name => 'Area', colspan => 1, rowspan => 3 },
{ name => 'country', colspan => 1, rowspan => 3 },
{ name => 'location', colspan => 1, rowspan => 3 },
{ name => 'pers', colspan => 6, rowspan => 0 },
{ name => 'sales', colspan => 7, rowspan => 0 },
{ name => 'dev', colspan => 5, rowspan => 0 },
{ name => 'total', colspan => 1, rowspan => 3 }
],
[
{ name => 'employee', colspan => 3, rowspan => 0 },
{ name => 'contractor', colspan => 2, rowspan => 0 },
{ name => 'total', colspan => 1, rowspan => 2 },
{ name => 'employee', colspan => 2, rowspan => 0 },
{ name => 'contractor', colspan => 2, rowspan => 0 },
{ name => 'consultant', colspan => 2, rowspan => 0 },
{ name => 'total', colspan => 1, rowspan => 2 },
{ name => 'employee', colspan => 2, rowspan => 0 },
{ name => 'consultant', colspan => 2, rowspan => 0 },
{ name => 'total', colspan => 1, rowspan => 2 }
],
[
{ name => 'f', colspan => 1, rowspan => 1 },
{ name => 'm', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'm', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'm', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'm', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'f', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'm', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 },
{ name => 'f', colspan => 1, rowspan => 1 },
{ name => 'total', colspan => 1, rowspan => 1 }
]
];
</code>
<p>The second data structure, $recs_formats, describes the row headers in a different way.
It is a hash, whose values are hashes of arrays. The keys for the hash are the level, i.e. the
column number. The inner hash keys are the same values that will show up in the data.</p>
<p>This structure is supposed to be used in a destructive way: </p>
<p>Reading the records, at column <code>[0]</code> I get the value "N". I check if there is
a "N" key in the hashref under "0". If I find it and the value is defined, then I get the first
available item from the arrayref. Something along the lines of:</p>
<code>
# $level =0; $value = 'N';
print '<td rowspan= "',
shift @{$recs_formats->{$level}->{$value}},
'">', $value,'</td>';
</code>
<p>The real code is a tad more complex than this, but just to give you the idea.</p>
<code>
$recs_formats = {
0 => {
'N' => [ 7 ],
'S' => [ 3 ],
'zzzz' => [ 1 ],
},
1 => {
'Germany' => [ 4 ]
'UK' => [ 2 ],
'Italy' => [ 2 ],
'zzzz' => [ 1, 1, 1 ],
},
2 => {
'Berlin' => [ 1 ],
'Bonn' => [ 1 ],
'Munich' => [ 1 ],
'London' => [ 1 ]
'Rome' => [ 1 ],
'zzzz' => [ 1, 1, 1, 1, 1, 1 ],
}
};
</code>
<h3>Descending from Trees</h3>
<p>These structures didn't come from the sky, but were created from two trees
that can both produce simpler structures or be the basis for some demanding reports.
For example, the XML output was generated directly from the co-operation between these two
trees.</p>
<p>You can see how these trees are related with the simpler data structures shown before.
Generating a array of hashes is a straightforward exploit of tree traversal functions.
Using a post-order traversal you get the column span values, and with a pre-order traversal
you push the data into the array of hashes. A similar treatment is due for the record
descriptor.</p>
<code>
-- xtab { colspan => 22 }
|-- Area { colspan => 1 rowspan => 3 }
|-- country { colspan => 1 rowspan => 3 }
|-- location { colspan => 1 rowspan => 3 }
|-- pers { colspan => 6 }
| |- employee { colspan => 3 }
| | |- f { colspan => 1 rowspan => 1 }
| | |- m { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| |- contractor { colspan => 2 }
| | |- m { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| +- total { colspan => 1 rowspan => 2 }
|-- sales { colspan => 7 }
| | employee { colspan => 2 }
| | |- m { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| |- contractor { colspan => 2 }
| | |- m { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| |- consultant { colspan => 2 }
| | |- f { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| +- total { colspan => 1 rowspan => 2 }
|-- dev { colspan => 5 }
| |- employee { colspan => 2 }
| | |- m { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| |- consultant { colspan => 2 }
| | |- f { colspan => 1 rowspan => 1 }
| | +- total { colspan => 1 rowspan => 1 }
| +- total { colspan => 1 rowspan => 2 }
+-- total { colspan => 1 rowspan => 3 }
-- recs { rowspan => 11 }
|-- N { rowspan => 7 }
| |- Germany { rowspan => 4 }
| | |- Berlin { rowspan => 1 }
| | |- Bonn { rowspan => 1 }
| | |- Munich { rowspan => 1 }
| | +- zzzz { rowspan => 1 }
| |- UK { rowspan => 2 }
| | |- London { rowspan => 1 }
| | +- zzzz { rowspan => 1 }
| +- zzzz { rowspan => 1 }
| +- zzzz { rowspan => 1 }
|-- S { rowspan => 3 }
| |- Italy { rowspan => 2 }
| | |- Rome { rowspan => 1 }
| | +- zzzz { rowspan => 1 }
| +- zzzz { rowspan => 1 }
| +- zzzz { rowspan => 1 }
+-- zzzz { rowspan => 1 }
+- zzzz { rowspan => 1 }
+- zzzz { rowspan => 1 }
</code>
<p>The same features are useful to create a XML document. A pre-order traversal
will create the opening tags, while a post-order traversal will create the closing
tags. This approach, rather than using [CPAN://XML::Simple], was preferred because
this special structure can also add information about the data before the DBMS
processing. The data set reports 'pers' and 'f', but the module internal memory
knows that 'pers' is a 'department' and 'f' is a 'gender'. The resulting XML is
enriched by this knowledge which would be lost in a simple conversion.</p>
<h3>Try it!</h3>
<p>Where does all this talk lead?</p>
<p>The module is designed for expansion, and you could expand it to suit your needs
if it doesn't do it now. You could create a new format or adapt existing formats
to accept cross tabulations. You could integrate crosstabs into graphical interfaces,
statistical packages, decision making applications, or whichever fancies you
today.</p>
<p>It is not that hard. The above structures would be easily available to a descendant
of DBIx::SQLCrosstab::Format. Your descending object should be used as the ones
in the official examples until you get the records. After that, the private
method <b>_find_headers()</b> will create the data structures described in this
node. From that point on, it's all yours.</p>
<p>Enjoy.</p>
<p><i>Further reading</i></p>
<ul>
<li><a
href="http://www.onlamp.com/pub/a/onlamp/2003/12/04/crosstabs.html">Generating Database Server-Side Cross Tabulations</a>, the original article;</li>
<li>[http://search.cpan.org/perldoc?DBIx::SQLCrosstab|DBIx::SQLCrosstab online documentation], where everything is duly explained;</li>
<li>[id://153259], where your curiosity about trees can be satisfied;</li>
<li>[DBI Recipes], where less demanding DBI idioms are explained.</li>
</ul>
<p><small>Thanks to [tye], who adjusted PM allowed tags and made the formatting of this node much easier.</small></p>
</readmore>
<div class="pmsig"><div class="pmsig-127116">
<pre>
_ _ _ _
(_|| | |(_|><
_|
</pre>
</div></div>