|Perl: the Markov chain saw|
Re: RFC: beginner level script improvementby roboticus (Chancellor)
|on Sep 20, 2013 at 00:40 UTC||Need Help??|
Before I go into one of my long, meandering list of things, I'd like to say:
On a more serious note: By putting your code up here for review, I think you already have the right attitude. Always wanting to improve how you do things is a great skill for a programmer to have.
OK, then, on to the bitching! I've put it in <readmore> tags so people used to my occasional long and meandering nodes can skip it. A brief summary for them: Naming, whitespace, short subroutines! Possibly a bit more, I don't remember what I wrote, and I hate this keyboard.
Random notes as I read your code.
There's too much "decoration". A common beginner habit is to spend a lot of time on formatting large comment blocks and such. It's not a terrible thing, it's just something I see far too often. Some decoration can be nice and help organization. But it's easy to go overboard.
End blocks are pointless. When you have a large heading on each section, it seems pointless to have an equally large end marker--especially since the next item will either be another large heading or the end of the file. Both will be good visual cues to the end of the previous section.
Names should be clear and meaningful. printversion, printtmpl: Neither of them actually print anything. Each simply returns a text string. A name should be suggestive: that's the most efficient method of documenting your code. You don't need to write many comments when your code clearly describes itself.
Digression 1: Many people will tell you to spell out everything in full. I think they're nuts. A proper set of common abbreviations go a long way to making the code simpler to read. Examples: FName means file name, FH means file handle, rpt means report, db means database, etc. If you're consistent--and you should be--the code can be easier to read.
Neither of your printversion, printtmpl functions actually print anything. Each simply returns a string. Rather than having a function for each string you need to retrieve, I'd suggest putting your text strings into a hash at the beginning, and provide a function to retrieve the one you want:
In your help function, breaking your text up into multiple print statements complicates things more than necessary. Try using a single "here document":
Speaking of your help function, it doesn't look like your help function returns a useful value, so I simply removed it.
Then peepul wil haff to contsanly tyep teh saem mispelings evere tim yoo yoose it. And they will curse your name. And make fun of you.
The other problem with the san
Looking at the function, it simply performs a set of replacements with a funky hardcoded bit. I can't think of a good name for the function, which makes me think that you may be approaching the problem sideways.
Except for the hardcoded bit, I'd suggest translate, replace or similar for the function. But you can't use those as the basis for a name with that hardcoded bit unless you start adding adjectives to your function name: translate_with_funky_bit(). I'd suggest splitting out the general-purpose multiple_string_replace function from the do_funky_escape_code() bit. Then you could reuse the replacer if/when necessary. You could write a function that takes your list of things to replace and transform them into the list of things used for the replacer function.
Imagine my surprise when I started looking at the taint function: It's basically the multiple_string_replace function.
Next, gettime. For some reason, you don't mind a simple operation to convert the year to the correct value, but you use an array to fix up the month. Also, you're creating new variables instead of using the ones you have. If I were going to write it (instead of using a module or strftime), I'd do it more like:
I abbreviated the variable names because they're used in a very small area, and the abbreviations are obvious in context. I ignored the unused return values from localtime, since they aren't used. I also reused $yr and $mon to hold the "adjusted" values.
Your parsetempdb function does too much in one place, reducing any potential reuse. (I already mentioned naming, so I won't beat that horse any further.) The function has several interesting operations: Parsing the DB file may be useful by itself, so I'd split that out into a subroutine. Generating a report from the data structure could also be useful, so I'd split that out, too.
So you could wind up with something like:
The same can be said for sendmail: It does more than the name implies. Splitting it into smaller components would be useful. One of those components should be a bit of code to send mail--you could call it sendmail! (Sorry, I couldn't resist.)The Main Code
I'm running out of steam, so I'll just note a couple of things:
You don't use enough whitespace. Your indentation style is fine (it's clean and consistent), you just don't use it often enough. Line breaks are good too. It can be difficult to read a "wall of code", so use whitespace to break things up into single "thoughts". As an example, the 'perform various sanity checks' block is too tightly packed, so it's harder to read than it must be. Put in a few line breaks and indents.
Your main code block is all coded straight through. You can break it into subroutines (even if they're used only once!) to "outline the code", making it easier for the reader to understand. It also helps limit the amount of code you need to read to fix an error.
Ah ... you *did* do this a bit later in the code. Good.
OK, I've run out of gas. So I'll leave with one last bit: Your functions in this chunk are just *too big*. I expect that it's more of the same: You're trying to do too much in a single subroutine.
You'll find things easier to maintain and test if you keep your subroutines short and focused. Each individual component can be tested easily. But if you make a large do-all function, it's a *lot* harder to test, and if you make a change, you have to update more tests to ensure you hit all the possibilities.
Thanks for sharing. And please don't be offended at my random writings. After all, I'm just a nitpicky bastard anyway! ;^)
P.S. I don't remember what prompted me to write this, and it's far away from where it was. I didn't want to throw it away, so here ya go, an added bonus:
Roboticus' rule #1: If you always have to do something, you should *never* have to do it.
Many languages foist this stuff on us. Perl is one of the better ones. One example is the switch statement in C:
Here, you *always* have to put in the break statement, or C will fall through to the next case. You should only have to do something explicit for the *NON*default case.