http://www.perlmonks.org?node_id=967725


in reply to perl & SQL best practices

You're overly optimistic about the "higher level": the kind of decisions you're talking about are typically made almost entirely for social reasons, and since you're the only guy (at the moment) who needs to feel comfortable with it, there isn't a lot of reason to do things one way or the other.

You might try to imagine what kind of person you're likely to hire to assist you with the work. Currently, there's a good deal of ignorance about the RDBMS-world among programmers, and most of them seem to be looking for strategies to avoid learning anything about it (e.g. object-relational mappers are very popular, as are "non-relational" databases). Keeping the SQL simple and moving the complexity to the perl side would seem to be a good bet.

But if you're fortunate enough (I would call it fortunate) to be hand-crafting your SQL, there's no reason to go overboard on that strategy... by all means, do table joins on the database side, for example. Make use of uniqueness constraints, normalize the schema as well as you can. I would have to think twice about using stored procedures, unless they were a clear win for some reason.

Replies are listed 'Best First'.
Re^2: perl & SQL best practices
by JavaFan (Canon) on Apr 27, 2012 at 22:38 UTC
    by all means, do table joins on the database side, for example
    By all means? Joins are costly, and considering there typically are more clients than servers, there are a lot of situations were it makes sense to do the join on the client -- there's a lot more client CPU available than server CPU. In other cases, it does make sense to do it on the database side.

    Answers like "by all means" are fundamentally wrong.

    Make use of uniqueness constraints, normalize the schema as well as you can.
    Again, the answer is, it depends. One of the steps to rise above grasshopper level is to know what normalization is, when it's useful, and, as importantly, when it makes sense to not normalize. If you're a bank, you may care more about avoiding data redundancy than when you're serving many million customers on a daily bases, and page delivery time is essential.

      By all means, start out doing table joins inside the DB. I have relatively rarely seen a situation develop where the schema vs. the needed data (and how often and how quickly) ends up showing how you've painted yourself into a bit of a corner and you can work around the problem by doing some joins in the client.

      Much of the time, this is also about limitations or quirks of the query optimizer and how much time / expertise you can throw at recrafting the query to convince the optimizer to do the efficient thing.

      Frankly, I've more often seen client-side joins done because the people working on the project rather sucked at database performance issues. Or even because the team just believed that "joins are expensive". But when I've actually worked on the performance issue that lead to client-side joins, there was a much more efficient solution that didn't involve client-side joins.

      So, I have done client-side joins for performance reasons. But only very rarely and always due to things that had better resolutions in the long run. It can be a powerful work-around.

      I've never seen client-side joins done at the outset for good reasons. So, yes, start out with server-side joins, by all means.

      - tye        

      Joins are costly, and considering there typically are more clients than servers, there are a lot of situations were it makes sense to do the join on the client -- there's a lot more client CPU available than server CPU.
      Try another rule of thumb: the network is slow. Moving more data from server to client than you need to is rarely a good idea. First guess-- albeit not the last word-- is to use the features built into the database to crunch the data.

      And if a database join seems "expensive" to you, I suggest you need to look at your indexes.

        Try another rule of thumb: the network is slow. Moving more data from server to client than you need to is rarely a good idea. First guess-- albeit not the last word-- is to use the features built into the database to crunch the data.
        I will repeat myself, but I fear I might as well talk to the walls: it all depends. Measure, and you will know. Don't assume the network is slow -- and that hence you should overload your database server as much as possible. Slowing everyone else down, so your query gets done "at the server" may not the best solution.

        Measure. Don't assume.

        Measure. Don't assume.

        And once you have implemented a solution because it's the best today, measure the next day again and the day after, and keep repeating it, because circumstances change.

Re^2: perl & SQL best practices
by ksublondie (Friar) on Apr 27, 2012 at 23:15 UTC
    You're overly optimistic...

    Yea, you got me there.

    But if you're fortunate enough (I would call it fortunate) to be hand-crafting your SQL...

    You're probably right, but sql server is complicated enough, that sometimes I wish someone else had the control, aka responsibility, of it. The more I know, the more I know I DON'T know. I've toyed with the idea of getting some sort of sql certification (so I can be a better perl programmer, of course), but then that's a little sick and wrong.

      The more I know, the more I know I DON'T know. I've toyed with the idea of getting some sort of sql certification (so I can be a better perl programmer, of course), but then that's a little sick and wrong.
      Well, you have my sympathies... essentially there's an infinite amount of stuff we all need to learn in order to do the job right, so coming up with strategies to provisionally minimize what you need to learn is essential.

      Myself, I would say that knowing your way around relational databases is a fairly fundamental, useful skill, and whether or not you bother with a "certification", it's not a bad direction to go in.