in reply to SQL JOINs vs WHERE statements

You're not crazy, you can indeed get an equivalent join using a WHERE clause.

But I don't recommend it.

It mixes up your different types of clauses (JOIN vs WHERE), which makes your SQL harder to understand, and worse, you can get unintuitive results.

For example, what would you expect the following to return?

create table people (person_id integer not null primary key, surname v +archar(40) not null) insert into people values (1,'Smith') insert into people values (2,'Jones') insert into people values (3,'Harry') insert into people values (4,'Peanut') create table town (person_id integer not null references people(person +_id), town varchar(40) null) insert into town values (1,'Bristol') insert into town values (2,'Washington') insert into town values (3,NULL) insert into town values (4,'Cairns') SELECT surname, town FROM people, town WHERE people.person_id *= town.person_id AND (town <> 'Washington' AND town IS NOT NULL)
Would you be surprised to get the following results?
surname town + ------------------------------ --------------------------------------- +- Smith Bristol Jones NULL Harry NULL Peanut Cairns

(Hint: You should be surprised, the SELECT was trying to exclude NULLS)

The source of this ambiguity is that the IS NOT NULL part of the WHERE clause got evaluated before the join part, and the join part then re-introduced the NULLs.

In contrast, the LEFT OUTER JOIN makes the intention clearer (albeit somewhat self-defeating in this example)

SELECT surname, town FROM people LEFT OUTER JOIN town ON people.person_id = town.person_id WHERE (town <> 'Washington' AND town IS NOT NULL)

Results:

surname town + ---------------------------------------- ----------------------------- +----------- Smith Bristol Peanut Cairns
Update: It seems that Oracle gives different results to MS SQL Server 2000, where this code was tested.

Replies are listed 'Best First'.
Re: Re: SQL JOINs vs WHERE statements
by bean (Monk) on Aug 19, 2003 at 19:00 UTC
    Actually, in Oracle I get the second set of results (no matter what order I put the where clauses). What database did you use, EdwardG?
    Update
    BTW, it's bad form not to name the columns in an insert. Instead of
    insert into people values (1,'Smith')
    you should do
    insert into people ( person_id, surname ) values ( 1,'Smith' )
      I agree about the named columns, normally insist on them when reviewing, but that piece of code wasn't the pertinent part of the demonstration, just the setup. Thanks though.

      Oh, and I used MS SQL Server to test that code. Looks like there's also a portability issue :)

        I figured you knew about naming columns when inserting, I just wanted it in the record for any SQL newbies out there.

        I guess to be completely clear and sure of your results, you'd have to do the following to be equivalent to the left outer join version (in MS SQL Server at least):
        SELECT pt.surname, pt.town FROM ( select people.surname, town.town from people, town where people.person_id = town.person_id (+) ) pt WHERE pt.town <> 'Washington' AND pt.town IS NOT NULL
        I just realized that although you've requested a left outer join, what you get is the same as an inner join because all of the columns in your example match up. You need to add a person who isn't in the town table at all:
        insert into people ( person_id, surname ) values ( 5, 'Drifter' )
        What really surprised me was that you got Jones in the results - it seems like MS SQL Server did this:
        SELECT people.surname, t1.town FROM ( select * from town where town <> 'Washington' AND town IS NOT NULL ) t1, people WHERE people.person_id = t1.person_id (+)
        Essentially, it pared down the town table before joining - probably good for performance, although it gives strange results. Interesting - you learn something new every day here at sqlMonks. What? This is perlMonks?
        Update
        That part about it not being an inner join instead of a left outer join? Forget about it - it is a left outer join because of MS SQL Server's apparent order of operations (see my last SQL snippet above).