|The stupid question is the question not asked|
SQL database architectureby punchcard_don (Beadle)
|on Nov 03, 2003 at 18:57 UTC||Need Help??|
punchcard_don has asked for the
wisdom of the Perl Monks concerning the following question:
I realize this is more of an SQL question, but all the middleware will be Perl querying a mySQL database using DBI, I promise.
Not looking for someone to tell me what to do, or do my job. I'm seeking methodological or, even better, quantitative, means to choose/justify one SQL database architecture over another where query performance is the priority.
Imagine this: A database holds students' school exam results. By the end of their scholastic career, each person has taken 500 exams. Every exam had 50 multiple-choice questions with answers numbered 1 to 9. Our starting point is a flat-file of records for 20,000 students of the form:
student1, exam1, answer1, answer2,..., answer50
student2, exam1, answer1, answer2,..., answer50
Now we want to compile statistics on who answered what. What percentage of total students who got question x in exam y right, also got question b of exam c right? In other words
SELECT COUNT(*) WHERE response-to-queston-x-on-exam-z=y AND response-to-queston-b-on-exam-d=c
and we want to do this over the web letting researchers specify the questions and answers they query on.
A few possible architectures:
1. Design the database exactly as the flat-file and take the intersection of
SELECT COUNT(*) WHERE (exam=z AND answeri=y)
2. Append all the records for a given student to make one 2,500 column record per student and search
SELECT COUNT(*) WHERE exami_answerj=y AND examb_answerc=d
3. Give each of the possible 25,000 answers to each multiple-choice question a unique 5-digit identifier and its own column, then just signify if that answer was given by a "1" in its column, and search
SELECT COUNT(*) WHERE ansxxxxx AND ansyyyyy
4. Make a separate table for each possible answer with studentID as the key, then take interstection of searches.
5. Make a separate table for each question, leaving the multiple-choice answer in the 2,500 tables and searching for the desired answer
But as I said, I'm hoping for some method or analysis by which to make/justify the choice. "I think XXX would be best" won't impress my boss much. Is there a way to choose - preferably quantitatively? Or does one just have to build, benchmark, and repeat?