|Problems? Is your data what you think it is?|
Notice: my intention in this post is not to start a PHP versus Perl flame war. This post is about good coding practices being applicable to all languages, not about any particular languages weaknesses. Further, for those who wish to flame me about the "Perl problems" that I mention, this thread isn't the place. Heck, those "problems" could simply be personal prejudices. However, the point of this node is not whether or not Perl is perfect; it's about good coding practices.
I've recently been learning PHP for a Web site that I need to maintain when I discovered something curious about the language: you can't predeclare variables. In fact, anyone can create a global variable (with any data they want in it) in your code simply by insert an appropriately names form element in the HTML document that has the data they want. There does not appear to be a PHP equivalent of 'use strict'.
My initial thought was to write a Perl script that validates my PHP code and warns me when I have misspelled a variable, used it only once, etc. It was irritating to me that an interesting tool like PHP has such a glaring violation of good coding practice that I started thinking about this a bit differently. I've programmed in quite a few languages and noticed that all of them have problems. Here's a brief sample:
Many have seen that newer programmers often fail to use strict, warnings, taint checking, or many other good programming practices that are suggested to them. I'm here to say you're not only hurting yourself; you're hurting anyone who has to maintain your code. The interesting thing about these programming practices is that they are not Perl-specific. In fact, there are few, if any, languages where these programming practices don't apply.
Why good programming practices are good
Let's start off with 'use strict'. Use strict affects variables, references, and subroutines. For the sake of brevity, I'll just cover variables.
Let's face it, when you have a 2000 line program and buried in that program, somewhere, is a variable mis-named %quarterly_reciepts, it's not an easy issue to figure out. Finding a misspelled variable name is a snap when you predeclare variables, but if don't, you may have no idea that your code is spitting out bad output because of a misspelling. You might spend time figuring out if you're reading from your database correctly or wondering if you have a file buffering problem. Why wonder whether or not you've misspelled a variable when you can trap that issue in a couple of seconds and potentially save many, many hours of debugging? I guarantee that programmers coming behind you may not thank you for using strict, but they will curse you if you don't.
Perl has 'use strict' to protect against undeclared variables. VBScript has 'Option Explicit'. Even venerable COBOL has 'Working-Storage' to deal with these issues. If this feature is optional in your language of choice, turn that option on!
So, you've written your first module. In fact, you've written an entire suite of modules that share data amongst themselves the programs that use them. Knowing that laziness is a virtue (a false virtue, in this case), you decide to use global variables for some data that everything uses. Here are potential problems with this (some of these are general issues, others are Perl-specific):
Each piece of code should do one thing and do it well. I think one of the most famous Perl examples of violating this principle is the following misguided attempt to parse form variables.
See the line that tries to eliminate server side includes ($value =~s/<!--(.|\n)*-->//g;)? Aside from the fact that it's a terribly written regular expression, it also will cut out a lot of HTML comments (in fact, it will pretty much destroy an HTML document if it has more than one comment in it). What happens when you want to include HTML? You have to rewrite this routine, which could cause problems if other code relies on it. A form-parsing routine should parse the form data, that's all. If you want to strip anything out of that data, do it elsewhere.
Code that doesn't have side effects is known as 'orthogonal' code. For example, if you step on the brakes in your car, you don't want it to veer to the left. If you turn on your headlights, you don't want that to automatically trigger your windshield wipers. If you are validating a username and password, don't go out and grab the CNN headlines in the same routine.
Check your system calls
We've all seen it:
If you failed to open the file, your code continues to silently run. If this is embedded in a large system, this could take a long time to track down. Sure, adding the "or die: $!" is more work, but the extra cost of fire insurance is a blessing when your house burns down.
Many newer programmers fail to realize that something is going to go wrong with their code. Maybe the user types a letter instead of the numbers you have on your menu choices. Maybe a function returns an array instead of a reference to one. Maybe, gasp, someone with malicious intent is trying to break your code (hopefully, they're your testing department).
Sometimes, you may think that validating your data is a waste of time. I remember one time that I was writing a program that would summarize commission data and the programmer who wrote the system that I was working on asked me why it was taking so long. I showed her my code and it had gobs of input validation. As it turns out, she had written a wrapper for this system which validated all data long before it got to me. In theory, I could have dispensed with my validation. However, what happens if the input data for the system changes and someone needs to rewrite that wrapper? We all know how easy it is to write buggy code and there's no guarantee that nice, clean data that enters my program today will be clean tomorrow. Remember, you're sleeping with every program that your program ever slept with (okay, that was a rotten analogy).
One of the beautiful things about strong data validation is that you control the error messages. Rather than having a program die a horrible death when it tries to divide by zero, you've already trapped that undeclared variable and have a nice, useful message in the error log.
Factor out common elements
Ooh, that's miserable. After factoring out the appropriate form value:
Much better. Now, if we need to tweak the page value at all, we only do it in one place.
For Perl, here's an example from a module I wrote recently (simplified for clarity):
After rewriting this routine for the third time, I realized that the only thing changing was my ID and the table name. Needless to say, that quickly changed. Now, my "update" methods only validate the ID and supply the correct table name. They are then passed to a generic update method. If I ever need to update that, I only have one place to do it instead of three.
The examples that I gave above were mostly focused on Perl. I did that because this is a Perl-related site and some of the monks who read this may only know Perl. However, the principles are not restricted to Perl. Hence the title 'use strict' is not Perl.
Whether you are a brand-new programmer or a seasoned veteran, these principles will apply to virtually any programming language you use. Sure, you can't predeclare variables in PHP and COBOL only uses global variables, but that doesn't invalidate the other principles. If you get in the habit of spending a little time up front learning these things, you will be well-rewarded by writing better, tighter code that is much easier to maintain and has fewer bugs.
Vote for paco!
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.