Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

perl & SQL best practices

by ksublondie (Pilgrim)
on Apr 27, 2012 at 19:51 UTC ( #967711=perlquestion: print w/ replies, xml ) Need Help??
ksublondie has asked for the wisdom of the Perl Monks concerning the following question:

Hopefully, I'm not stirring up a hornet's nest or getting myself crucified with this question:

As the only programmer in the office and with full rein of the sql servers (albeit, not fully knowledgeable however much I would like to be), I have always in the past developed my sql queries directly from my perl scripts...of course doing my due diligence in testing any input to prevent injection and controlling user security. Now that I'm discovering more and more about what can be done directly inside sql itself, I find myself in a dilemma on what I should do inside my perl and what should be done on the sql server (sql scripting, stored procedures, dynamic sql, etc).

Of course security is priority #1 followed closely by performance. So testing my input through perl is a no-brainer. Maybe I'm missing something, but does dynamic sql on the server provide THAT much extra protection from injection? Is perl faster in developing ever changing complex queries or is sql faster with cached dynamic sql through stored procedures on the server? What are the best practices? What is the preferred course if you don't have a dba controlling what access you have?

I'm going to assume that the default answer by a dba is going to be "do everything on the db server" and likewise, the programmer will say "do everything on the client", but I like to think that the perlmonks rise to a higher level and will honestly recommend the best course of action independent of their own prejudices. :)

Comment on perl & SQL best practices
Re: perl & SQL best practices
by JavaFan (Canon) on Apr 27, 2012 at 20:56 UTC
    There's no best practices, and you aren't giving us much useful information to work with.

    First, you security/SQL injection: use place holders. That should cover most of the cases (although there are still cases where you cannot use them).

    But performance, it all depends. What are you doing? Where are your bottlenecks? Inserts? Queries? Do you have an online transaction system, or are you dataware housing? How many clients/server do you have? With many clients, it makes sense to offload tasks from the server -- but if the clients are already busy with other tasks, it may make less sense. Just focussing on the database server itself, where is your current bottleneck? How's your disk I/O? You storage? Can you hold the entire database in RAM? How's the network doing? How's the fibre doing? How's your replication lag?

    Unless you have (detailed) knowledge of how your systems work, what the current bottlenecks are, and where you would l like to be, anything you do with the idea of performance is just stabbing in the dark, with half the people in the room being your friends. You should measure, measure and measure again, and be prepared that in 3 months, the situation may be different, and you will have to redo your work.

      ...you aren't giving us much useful information to work with.

      Well, I guess I was asking as a more theoretical feedback-on-what-others-do-so-I-can-develop-a-guideline-for-my-own-current-and-future-coding question as opposed to a specific "Here's what I got, what do I do now???".

      But if it you're dying to know more details, I have 2 sql servers (one sql2005 and the other sql2008...all new db's are setup on the sql2008 and the sql2005 db's will eventually get upgraded as time allows...HA! Sorry, I can't keep a straight face typing that) with less than 20 db's each that are accessed by multiple programs...I have simple scheduled tasks that do nothing more than create reports which are then emailed to a few lucky individuals, and then I have few full blown CGI web apps that do transactional functions and everything in between...all with perl, of course. I currently don't have anything on the sql servers besides the data itself and some administration functions. All my sql interaction is through sql within my perl code.

      Relatively speaking we're rather small. The total number of individuals accessing any single db will be AT MOST 200...and that's if everyone is in at the same time. My db's range in size of 17M-84G. Bottlenecks? I don't necessarily have any (that I know of anyway). Network, disk, RAM, etc is fine as far I as I can gather. As time goes on, we are adding more canned applications, along with my custom coding, that depend on sql server. I'm just trying to be a little proactive in my development and prevent any "Darn, I shoulda..." problems in the first place if I can. Remember, I'm the only programmer and I hate nothing more than going back to fix code, let alone figure out what I did in the first place!

        So, you have no bottlenecks, no goals (performance wise), you're rather small, and all you can tell about your future is that you're adding "canned applications, along with my custom coding". Yet you ask from us suggestions that will prevent you from saying "Darn, I shoulda" in the future.

        What do you expect to get? I mean, even if I said "don't smoke", it doesn't mean it's going to prevent a "darn, I shoulda" in the future. Some people smoke all their life, and never get cancer, and people get cancer who don't smoke.

        Seriously, set goals, and measure your performance. Act if necessary. Magic bullets do not exist.

Re: perl & SQL best practices
by Jenda (Abbot) on Apr 27, 2012 at 21:59 UTC

    Dynamic SQL is dynamic SQL no matter where you construct it. There are times when it is the way to go, but you should try to keep them to a minumum. A (possibly generated) structure of

    IF (@Whatever is not NULL) IF (@Other is not NULL) SELECT ... WHERE @Whatever = Whatever and @Other = Other ELSE SELECT .. WHERE @Whatever = Whatever ...
    Tends to perform better if used often. Just make sure it doesn't get way too big. And keep in mind that if there is no index on a column (@Whatever is NULL or @Whatever = Whatever) is enough, no need to test that variable in an IF and use separate selects.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      Just make sure it doesn't get way too big.

      "Big" as in results, lines of code, too many if/else/then, or were you referring to something else?

        Lines of code. The code has to be compiled the first time you use the stored procedure that encloses the structure and if it gets way too long, the compilation will take too much time.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re: perl & SQL best practices
by doom (Deacon) on Apr 27, 2012 at 22:05 UTC

    You're overly optimistic about the "higher level": the kind of decisions you're talking about are typically made almost entirely for social reasons, and since you're the only guy (at the moment) who needs to feel comfortable with it, there isn't a lot of reason to do things one way or the other.

    You might try to imagine what kind of person you're likely to hire to assist you with the work. Currently, there's a good deal of ignorance about the RDBMS-world among programmers, and most of them seem to be looking for strategies to avoid learning anything about it (e.g. object-relational mappers are very popular, as are "non-relational" databases). Keeping the SQL simple and moving the complexity to the perl side would seem to be a good bet.

    But if you're fortunate enough (I would call it fortunate) to be hand-crafting your SQL, there's no reason to go overboard on that strategy... by all means, do table joins on the database side, for example. Make use of uniqueness constraints, normalize the schema as well as you can. I would have to think twice about using stored procedures, unless they were a clear win for some reason.

      by all means, do table joins on the database side, for example
      By all means? Joins are costly, and considering there typically are more clients than servers, there are a lot of situations were it makes sense to do the join on the client -- there's a lot more client CPU available than server CPU. In other cases, it does make sense to do it on the database side.

      Answers like "by all means" are fundamentally wrong.

      Make use of uniqueness constraints, normalize the schema as well as you can.
      Again, the answer is, it depends. One of the steps to rise above grasshopper level is to know what normalization is, when it's useful, and, as importantly, when it makes sense to not normalize. If you're a bank, you may care more about avoiding data redundancy than when you're serving many million customers on a daily bases, and page delivery time is essential.

        By all means, start out doing table joins inside the DB. I have relatively rarely seen a situation develop where the schema vs. the needed data (and how often and how quickly) ends up showing how you've painted yourself into a bit of a corner and you can work around the problem by doing some joins in the client.

        Much of the time, this is also about limitations or quirks of the query optimizer and how much time / expertise you can throw at recrafting the query to convince the optimizer to do the efficient thing.

        Frankly, I've more often seen client-side joins done because the people working on the project rather sucked at database performance issues. Or even because the team just believed that "joins are expensive". But when I've actually worked on the performance issue that lead to client-side joins, there was a much more efficient solution that didn't involve client-side joins.

        So, I have done client-side joins for performance reasons. But only very rarely and always due to things that had better resolutions in the long run. It can be a powerful work-around.

        I've never seen client-side joins done at the outset for good reasons. So, yes, start out with server-side joins, by all means.

        - tye        

        Joins are costly, and considering there typically are more clients than servers, there are a lot of situations were it makes sense to do the join on the client -- there's a lot more client CPU available than server CPU.
        Try another rule of thumb: the network is slow. Moving more data from server to client than you need to is rarely a good idea. First guess-- albeit not the last word-- is to use the features built into the database to crunch the data.

        And if a database join seems "expensive" to you, I suggest you need to look at your indexes.

      You're overly optimistic...

      Yea, you got me there.

      But if you're fortunate enough (I would call it fortunate) to be hand-crafting your SQL...

      You're probably right, but sql server is complicated enough, that sometimes I wish someone else had the control, aka responsibility, of it. The more I know, the more I know I DON'T know. I've toyed with the idea of getting some sort of sql certification (so I can be a better perl programmer, of course), but then that's a little sick and wrong.

        The more I know, the more I know I DON'T know. I've toyed with the idea of getting some sort of sql certification (so I can be a better perl programmer, of course), but then that's a little sick and wrong.
        Well, you have my sympathies... essentially there's an infinite amount of stuff we all need to learn in order to do the job right, so coming up with strategies to provisionally minimize what you need to learn is essential.

        Myself, I would say that knowing your way around relational databases is a fairly fundamental, useful skill, and whether or not you bother with a "certification", it's not a bad direction to go in.

Re: perl & SQL best practices
by BrowserUk (Pope) on Apr 27, 2012 at 23:07 UTC

    I think a lot of data gets imported into RDBMSs for the wrong reasons.

    But, once you have gone to the bother and expense of putting it in there, it doesn't make a whole lot of sense to export it en-masse, only to sub-select it using Perl (or other programming language). So as far as is practical, have the DB do the selection and only export the data you need to the procedural programming language.

    Conversely, SQL has it weaknesses. String manipulations, complex data conversions, and similar data derivations are possible in (some dialects of) SQL, but they are often clumsy and awkward to program and horribly inefficient to process.

    A particular case of an SQL routine I saw a few years ago, that took a date in the internal format and returned it formated for human consumption in any of the (from memory, (back then) 22) official EU languages. The SQL ran on for 5 or 6, 50-line screens and a totally incomprehensible morass of nested CASE expressions and repetitious SUBSTR() calls. It was actually quite an engineering achievement; but it ran like a dog and consumed prodigious amount of server resources. Moving that date formatting out of the sql and into the C code that was producing the reports reduced both time, and the server costs immensely.

    The lesson is to use the strengths of each language for the appropriate parts of the processing. SQL is really good at picking out the bits of the total dataset that match certain criteria; Perl is really good at massaging data into other forms.

    Of course, if you have pl/perl available within the DB it blurs the lines again. But if the results of manipulations are only required client-side, then let the client do it. If they will be used or reused by multiple clients, then it can make sense to do them server-side, where it doesn't impose undue load or require extraordinary feats of SQL to do so.

    Full-text searching is another area where it often makes sense to avoid the SQL primatives (like LIKE), by pre-processing the data externally and building index tables as the text is uploaded to the DB.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: perl & SQL best practices
by erix (Vicar) on Apr 28, 2012 at 08:50 UTC

    Do you have backups? If hardware fails, do you know how much downtime there will be? Do you know how much data loss is acceptable? Do you know how much downtime is acceptable? You do not mention requirements with regard to availability, but perhaps replication is an useful option? (I don't know the replication possibilities of MS SQLServer; I assume they are there.)

    If you do not have backups and answers to the above questions, I would say: disregard everything else and get that organised first.

Re: perl & SQL best practices
by nikosv (Hermit) on Apr 28, 2012 at 14:40 UTC

    Think of the stored procedure as a module that encapsulates code into one piece and exposes a public interface.That way when the underlying implementation changes you don't have to change the interface too.With ad-hoc sql you have no such advandage.

    Furthermore a stored proc is compiled server side and can be executed faster because of caching. Another advadnage is that you can write triggers that call the procedure when an DB event happens.

    The security wise advandage it that you grant access for executing the procedure but you don't have to give access/grant to the underlying tables;you have to sanitize input though.One limitation is that not all DBI/DBD drivers can bind parameters to procedures.

Re: perl & SQL best practices
by roboticus (Canon) on Apr 30, 2012 at 22:50 UTC

    ksublondie:

    When designing databases, I tend to:

    • Use stored procedures for all programmatic data changes (inserts, updates and deletes).
    • Provide predefined views for data.
    • Restrict access to the tables (including select).
    • Constrain the hell out of it.

    It's pretty restrictive, but I think that's a good thing early on in the life of the database. If I find a better way to organize the data, I want the flexibility to do so. Early in the life of a database, I find myself restructuring tables, renaming columns to clean things up, etc. But as the database ages/stabilizes, you won't want to change the structure nearly so often. At that point, I feel it's perfectly reasonable to open up the select privileges on tables. However, I still restrict data changes to the stored procedures.

    After all, you won't code all of a large system perfectly the first time through. I fully expect that a database design will also warrant improvement once the rubber hits the road. You'll find that a table over here should be split, some tables over there may be merged, etc. But the more publicly accessible structure you have, the "wider" the interface to the database. As you have more programs accessing the database, anything publicly accessible becomes "locked down" and unchangeable. Then you'll constantly be stubbing your toes against ugly parts of the database.

    You can often make major changes to a program to fix architectural problems, but databases seem to have a habit of living longer than expected, and having more and more programs connecting to them. If they're unconstrained, they're *much* more difficult to change without risking breakage to systems you don't know about or have forgotten about.

    By locking things down, at least in the early stages, you have a chance to alter the database structure as you learn more about the problem. I find the minor headaches involved doing it this way to be less irritating than the major headaches involved when having to change things and not know what may be affected. I'm sure I'm in the minority, as I've been involved in plenty of database projects, and I've not encountered anyone who I thought took it too far. (So I suppose I'm the guy that takes it too far.) But I'm in the position of not only getting to make those choices, but also the one most affected by them. (Since I'm the guy everyone in my department comes to for assistance in getting database work done and installed.)

    If you create well defined views and stored procedures for data updates, you can make the database easy enough to work with that changing some back-end workings needn't be traumatic.

    Ultimately, I'd suggest two things: (1) Lock down everything as tightly as possible as you learn your way throughout the system and the business problems you need to solve, (2) read plenty, especially things with data to back them up (unlike this node!), and then (3) loosen things up as you find appropriate.

    When writing a node like this, I often find myself forgetting a myriad of details, and wishing that I could write it more clearly. But at some point, you've gotta hit either "create" or cancel the page....

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://967711]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2014-10-21 07:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (98 votes), past polls