|Just another Perl shrine|
Re: Re: SQL Crosstab, a hell of a DBI idiomby gmax (Abbot)
|on Dec 11, 2003 at 17:05 UTC||Need Help??|
SQL crosstab complexity depends on the number of distinct values in the columns involved with the crosstab. The 600 lines I mentioned were due to a query asking for COUNT, SUM, AVG, MIN, MAX with row and column subtotals, thus requiring a UNION for each level of row header.
In a database with the same structure but with one million records, the query would not have been much longer, provided that the data is properly checked on input.
Of course, if you try to do a crosstab by person's name in a table of one million records, you are likely to run out of space, but OTOH crossing data by names wouldn't let you in much better shape with any statistical tool.
About having 50-100 values in each of 4 dimensions, yes, it's true that you would get an unbearable number of combinations. But you'd get such complexity with any tool, and even if you manage to get such result, it is not readable. Theoretical limits and practical limits need to be considered here. The main purpose of crosstabs is to give an overview of a situation, mostly something that is useful for human consumption. Nobody in his right state of mind would consider reading a document with 50,000 columns and 100,000 rows (provided that I find the paper to print it!)
Databases with statistical needs and data warehouses are designed in such a way that data can be grouped by some meaningful element. If the designers allow such element to reach thousands of values, then it becomes useless for this kind of overview.
Anyway, consider that one side of the crosstab (rows) can grow up to the limits of the system, so if one of your values has a large set of distinct values you can always decide to move it from column to row header, keeping in mind that if you generate too many rows it may not be valuable as a statistical report.
I ran some tests on my database of chess games (2.3 million records) and I got meaningful results in decent times. I generated a few thousand columns, just for fun, but I would never want to be the one in charge of analyzing such a report!
_ _ _ _ (_|| | |(_|>< _|