in reply to Re-Factoring: Philosophy and Approach
In the inevitable compromises involved in reducing several different activities into a uniform format, what do you trade off for what?
In general, I trade off generality for simplicity and maintainability. I try to make my code easy to read, easy to maintain, and orthogonal rather than easy to write (the first time...), but that doesn't happen as often as I'd like.
Where do you start in deciding what goes in which subroutine?
Subroutines should be orthogonal: as self-contained as possible. I try to take the approach that if someone just wants this one subroutine, they should be able to use just this one subroutine, rather than a bunch of other subs that it depends on for setup, data extraction, etc. (Not to say that subroutines should be entirely independent of each other -- obviously you'll want to use sub foo in sub bar rather than duplicating foo's code -- but from the user's perspective, only one call should have to be made.) This way, changes to the guts of a subroutine are as local as possible to that sub (and you don't have to grovel all over your code looking for places you might have affected).
Do you use one module or several?
Often none. If I find myself doing the same thing for a couple of different scripts, I'll pull the stuff out and make a module, but I tend to be lazy about code reuse: instead of packaging something up with the idea that I'll use it later, I wait until I need something that I've already written before breaking it out into a module.
One thing that I've found about modules, at least in my experience, is that it's easy to write tightly coupled code: after all, you have your own namespace, or someone's passing around a nice self-contained object, so it's not such a big deal to share some function between two or three subs, is it? Then, six months later, you try to fix something, and you fix it in one place instead of two or three... you know the drill.
What's the optimum size for a subroutine?
Five tons of flax.
Seriously, this doesn't have a single meaningful answer. I like to keep breaking up a sub until all of the nontrivial bits are in their own (properly named) functions. Anything longer than, say, forty or fifty lines is probably in desperate need of refactoring.
If you've got one thing you do over and over again in two slightly different ways, do you want one subroutine with an if or two subroutines?
If I'm doing something similar over and over again, in two slightly different ways, I'll probably want to do it in a third way later on. I try to break out the "same"-ness into an algorithm, and modify its behaviour with a parameter or two. This isn't easy, of course.
And anything else?
Read, read code, code.--