http://www.perlmonks.org?node_id=799116

Jazz has asked for the wisdom of the Perl Monks concerning the following question:

I'm sorry, but this is more of a MySQL question than a perl one. I'd appreciate any pointers or suggestions, though.

First, the query:

my $sth = $dbh->prepare( "SELECT * FROM Images LEFT JOIN Regio ON Images.RegioID = Regio.RegioID LEFT JOIN Insula ON Images.InsulaID = Insula.InsulaID LEFT JOIN ImgThemes ON Images.ImageID = ImgThemes.ImageID LEFT JOIN Sources ON Images.SourceID = Sources.SourceID ORDER BY ImageName LIMIT $start,$numrows" );

Now, a simplified version of the table structure:

------- --------- ---------- Regio Images ImgThemes ------- --------- ---------- RegioID >--| ImageID >----< ImageID RegioName |--> RegioID ThemeID <----| ImageName | ------- |--> InsulaID ---------- | Insula | RoomID (etc) Themes | ------- | SourcesID >-| ---------- | InsulaID >-| | ThemeID >----| InsulaName --------- | ThemeName Sources | ThemeCategoryID <-| --------- | | SourceID <--| ---------- | SourceName ThemeCategories | Publishers ---------- | PublishYear ThemeCategoryID >-| CategoryName

The problem is that this yields Cartesian results, creating duplicate values for Image records when there's no match in the Images table for values in the joined ones.

For most of the join tables, the correct value always yields a single result (the name); for "ImgThemes", multiple values are most common. Not every Image is required ImgThemes, but not Regio or Insula.

What is the correct syntax to ensure all images are returned, but not duplicated? This is my first excursion into non-"natural" joins, and the goal is to retrieve each respective name in each of the tables without having to perform separate queries.

Thank you for any help you may be able to offer.

Replies are listed 'Best First'.
Re: DBI MySQL Join Question
by graff (Chancellor) on Oct 04, 2009 at 22:19 UTC
    You said:

    ...this yields Cartesian results, creating duplicate values for Image records when there's no match in the Images table for values in the joined ones.

    I'm not sure I understand what you mean there, but it sounds like you don't understand what "Cartesian" means in this context. You are querying for all columns from a set of joined tables, and when table A has two or more rows related to a single row of table B, the values from that row of B will have to be repeated for each matching row in A. That's the Cartesian product you get from doing that sort of join.

    If you only want one row of output for each row in the Images table, you either don't want to join with tables that contain multiple matches to a given Images row, or else you want to select some sort of grouped or aggregate value from those other tables, so that there is only one value from each table to be returned for each row of Images -- e.g.:

    SELECT ImageID,ImageName, Regio.RegioID,RegioName, Insula.InsulaID,InsulaName, Sources.SourceID,SourceName, (etc...), COUNT(ThemeID) FROM Images LEFT JOIN Regio ON Images.RegioID = Regio.RegioID LEFT JOIN Insula ON Images.InsulaID = Insula.InsulaID LEFT JOIN ImgThemes ON Images.ImageID = ImgThemes.ImageID LEFT JOIN Sources ON Images.SourceID = Sources.SourceID GROUP BY ImgThemes.ImageID ORDER BY ImageName LIMIT $start,$numrows"
    Note the addition of the "GROUP BY" clause, and the use of the "COUNT()" function for returning a single value for all rows in the ImgThemes table that match a given Image row.

    (update: fixed typo in field list of select statement)

      When I add the GROUP BY and COUNT() clauses as you noted, I get only 2 out of 10 Image records. When I change the GROUP BY to Images.ImageID instead of imgThemes.ImageID, I get all Image returned (unduplicated), but only one result from imgThemes. I need all of the records from imgThemes which match the ImageID.

      If you have time, please see the data structure I'm looking for in my reply to zwon above.

Re: DBI MySQL Join Question
by zwon (Abbot) on Oct 04, 2009 at 20:52 UTC

    If I got your question right, you want to get only one row for every image in result, but every Image may have multiple ImgThemes and Sources. If there are several ImgThemes for Image which one should be returned?

      Yes, for the non "ImgTheme" fields, I need only one result. For ImgThemes, I need all results. Ideally, I'd like to do it all of this at once, but think that ImgThemes will probably have to be a separate query.

      The big need now is to get RegioName from RegioID on Regio, SourceName from SourceID on Sources, etc., without duplicate Images being given.

      Maybe a Dumper output of the structure I need would help:

      $VAR1 = { 'SAMPLE' => { 'Regio' => { 'RegioName' => 'VI', 'RegioID' => 22 }, 'Themes' => [ { 'ThemeID' => 512, 'ThemeName' => 'Greek Battles', 'ThemeType' => { 'ThemeCategory' => { 'ThemeCategoryName' => 'Narrative Scenes', 'ThemeCategoryID' => 2 } } }, { 'ThemeID' => 196, 'ThemeName' => 'Alexander the Great', 'ThemeType' => { 'ThemeCategory' => { 'ThemeCategoryName' => 'Characters', 'ThemeCategoryID' => 4 } } }, { 'ThemeID' => 9, 'ThemeName' => 'Darius', 'ThemeType' => { 'ThemeCategory' => { 'ThemeCategoryName' => 'Characters', 'ThemeCategoryID' => 4 } } }, { 'ThemeID' => 17, 'ThemeName' => 'Central Floor Mosaic', 'ThemeType' => { 'ThemeCategory' => { 'ThemeCategoryName' => 'Zones', 'ThemeCategoryID' => 9 } } } ], 'ImageID' => 4, 'ImageName' => 'Battle of Issus Mosaic', 'Insula' => { 'InsulaID' => 96, 'InsulaName' => 'House of the Faun' } } }
Re: DBI MySQL Join Question
by EvanCarroll (Chaplain) on Oct 04, 2009 at 22:47 UTC
Re: DBI MySQL Join Question
by ELISHEVA (Prior) on Oct 05, 2009 at 04:49 UTC
    Hmmm. I think you may have the left join backwards. As written it is taking the cartesian product of Insula and Regio and then joining it to Image whenever there happens to be an image record that has that particular combination of Insula and Regio.

    The table after the words "LEFT JOIN" is the table from which you want to select all records even if they have a NULL value in one or more foreign key fields. If you want to take each Image record and join it to Regio and Insula wherever possible, then you need to swap the first two LEFT JOINs, like this:

    my $sth = $dbh->prepare( "SELECT * FROM Images LEFT JOIN Images ON Images.RegioID = Regio.RegioID LEFT JOIN Images ON Images.InsulaID = Insula.InsulaID LEFT JOIN ImgThemes ON Images.ImageID = ImgThemes.ImageID LEFT JOIN Sources ON Images.SourceID = Sources.SourceID ORDER BY ImageName LIMIT $start,$numrows" );

    Best, beth