Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

If you think about it just a little differently, they do respect set theory. Consider this.

Tables and views are sets. The rows of the table are isomorphic to the elements of the set. A SELECT query is nothing more than a operation that takes the cross product of zero or more sets (FROM) and then filters this big set (WHERE). But what are the elements of these sets?

The first thought is to consider that the elements are m dimensional vectors, where m is the number of columns in the table. Of course you can't have twice the same vector in a set (= you can't have two identical rows in a result) because that goes against what a set is! But if we use an obvious definition for vector equality, which is the vectors have the same dimension and the components are each equal to each other, we run into trouble from the start: the example tables you give cannot even exist in the first place! (since tables and views, not just results, are supposed to be sets)

So how do we solve this? You have to think of each row in each table as unique among all rows of the same table. Pretend that each row (element of the set) is a unique objec with m attributes. It might have the same values for all attributes as some other row of the table, but it's still unique. Another way to think about it is that there is a hidden "unique ID" column in every table (you need only add such a column if the table has no primary key). Now everything works. The row that is the cross of the first row from parts and the second row from supplier_parts is not the same as the row that is the cross of the second row from parts and the second row from supplier_parts even though they look the same because they have all the same values in the column. Well, for that matter the first and second rows of the parts table aren't the same row even though they share all the same values.

Suppose you reject the preceding paragraph as a terribly inelegant way to define uniqueness (and it is that, though it's the only way I know that works!). In doing so you restrict all your tables to contain only DISTINCT rows, or else they can not be interpreted as sets. You can force this by having a primary key in every table.

What happens? Now SELECT queries never produce duplicate results, just as set theory demands. Note that you have to SELECT * to see that this is true. Selecting a proper subset of the columns is only an output filter which can give the illusion that there are identical rows. It doesn't run splice (hey! I brought it back on topic by mentioning a Perl operator!) on the underlying vectors. But DISTINCT will help you filter those illusory duplicates from the output.

I'm not saying this is great, I'm just saying that's how you have to apply the set theory if you want it to work.

By the way, your last example, the one that results in the most outrageous number of rows, is wrong.

SELECT parts.pno FROM parts, supplier_parts WHERE ( supplier_parts.sno = 'S1' AND supplier_parts.pno = parts.pno ) OR parts.pname = 'Screw'

should be

SELECT parts.pno FROM parts, supplier_parts WHERE supplier_parts.pno = parts.pno AND ( supplier_parts.sno = 'S1' OR parts.pname = 'Screw' )

which gives a somewhat less outrageous 6 P1s and 1 P2.


In reply to They do respect set theory! by Celada
in thread (OT) Why SQL Sucks (with a little Perl to fix it) by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-20 02:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found