Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
Keep It Simple, Stupid

Re-Factoring: Philosophy and Approach

by George_Sherston (Vicar)
on Nov 02, 2001 at 22:27 UTC ( #122879=perlquestion: print w/ replies, xml ) Need Help??
George_Sherston has asked for the wisdom of the Perl Monks concerning the following question:

Alright, it had to come to this in the end, but it was dragonchild's challenging note that stimulated me to raise this question.

I'm about a third of the way through what seems to me to be a huge script, and I've come to a natural point at which I can stop and take stock. Basically how it works is, users get emails when they have to do something. The email contains a link to the script with a query string of CGI parameters. When the users click on their links they get their own personal query page in which they enter the information they're being asked for. The result is that I've got two thousand lines of code which serves up and processes a series of mini-forms, each unique, but all having lots of things in common, both as to form and content.

I knew from the outset that I was going to want to re-factor this, but I thought I'd write a big gobbet of it so that I had something to work with... and that's where I am now. Before I write the remaining two thirds I want to take what I have and split it up into intelligent re-usable subroutines and tidy them away into modules. Then the remaining 2/3 should be a bit quicker to write.

At the moment what I did was I paged down through the script and kept an eye out for repeating patterns (and indeed remembered where I had put some things that looked the same as other things) and I found one thing I was doing over and over and I put it in a module. But that doesn't seem like a very scientific way to go about it.

Perhaps the scientific way to go about it would have been to conceive the whole thing in modular form at the outset - but that's beyond me at my current level of experience. I hope when I've finished this project I'll have enough of a feel for how this stuff works that I could do that for the next one - but at present I decided the only way I cd do it was by writing a bit of it out and then condensing that.

So I'm appealing to wise monks for your views on how to approach this task, in the most general terms.
  • In the inevitable compromises involved in reducing several different activities into a uniform format, what do you trade off for what?
  • Where do you start in deciding what goes in which subroutine?
  • Do you use one module or several?
  • What's the optimum size for a subroutine?
  • If you've got one thing you do over and over again in two slightly different ways, do you want one subroutine with an if or two subroutines?
  • And anything else?
I've put the WHOLE of my existing script here so you can see what I'm up against. Of course, I'd love to have comments on individual bits of the code - but my main interest is in getting an over all approach to boiling it down so that (A) it works efficiently and (B) it's easy to expand / edit.

George Sherston

Comment on Re-Factoring: Philosophy and Approach
Download Code
by George_Sherston (Vicar) on Nov 02, 2001 at 22:46 UTC
Read Code Complete, young Grasshopper
by dragonchild (Archbishop) on Nov 02, 2001 at 23:05 UTC
    1. Try to do it on your own.
    2. Fail miserably.
    3. Read Code Complete.
    4. Try again, utilizing newfound wisdom.
    5. Fail to achieve perfection, but come really darn close.
    Maybe I should put that into a haiku. Possibly, something along the lines of:

    Re-factoring hurts
    Code Complete contains wisdom
    I am Grasshopper

    The basic idea is that you want to design interfaces to activities, not subroutines. You want to design a way to write using a bunch of "fire-and-forget" actions that map into your problemspace.

    For example, I'm assuming you're not going to run afoul of tilly/tye/merlyn and are using CGI. That is a way of mapping actions onto your problemspace, not onto code. That's what I'm talking about.

    Identify the actions you want to take, then write a subroutine for each action the problemspace you want to take.

    Now, to finish that top-level subroutine, you now have a new problemspace. Keep doing the same thing, over and over. This is factoring.

    Where do you stop?

    • When you can completely understand each unit.
    • When a sub is 25 lines or less.
    • When it starts to become silly to write another sub.
    • When you feel like it.
    You also ask if you should write one sub with two actions or two subs with one action each. It depends completely on your style and how much the differences between the actions matters. I tend to shy away from passing flags to a function. If I have foo() and I pass it either 'bar' or 'baz', then I should have foobar() and foobaz(). However, if the function is making decisions based on the data I'm passing it, then that's the role of a function. *shrugs* I hope that made some sort of sense.

    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Re-Factoring: Philosophy and Approach
by fokat (Deacon) on Nov 02, 2001 at 23:13 UTC
    I haven't read all of your code in detail, but instead tried to look at it as a whole. I think the first bit of insight I can offer you is here.

    The questions you ask have many correct answers, as they relate to coding style, software design and architecture. I'm confident you'll find a large amount of suggestions in the Monastery.

    What follows are some of my thoughts about what I've seen so far in your task and solution:

    • You're using a dispatch table to direct people to different handlers depending on what they need to do. I think this decision is right. However, remember than there are many other ways to do the same. In particular, having one script for each task might significantly ease working with your code.

    • You seem to be encapsulating an interaction with a backend database. Try to think about an internal representation for the data and for the operations that can translate for all your tasks. These two things are natural candidates for becoming modules, invoked from different scripts.

    • If you manage to encapsulate the access to the data in a set of modules, chances are you will be able to also encapsulate the generation of the interaction forms. Depending on how your scenario looks like, this might be an extension to the modules from the previous bullet, or a new module.

    • At the very least, you can try to keep all of your query statements in a single file. This will clean up the rest of the code a bit.

    Now, regarding your specific style questions:

    • I normally try to improve maintainability and portability over speed, as long as the results are produced within reasonable times. Note that this choice is heavily influenced on what are you being asked to do, and where are you doing it (ie, who do you work for :)).

    • In a similar note, try to maximize the speed at which you can provide a prototype for your application. My experience is that programmer's time is more important (and more expensive!) than machine time. Do not over-optimize, it takes a lot of time and the effort might not be worth it. Also, do not optimize until you have a running application that passes all your tests.

    • I try to keep my code as general as posible. This increases the chances of it being re-used for other projects.

    • I don't like code blocks that are longer than a page of text. When I see such a block, I tend to split it into subroutines.

    • I use as many modules as I need to. Sometimes, things fit together nicely in a single module. Some other times, you need many modules to encapsulate different or unrelated functions. Normally, you will be using more than a few modules, but not all are going to be yours... CPAN is your friend (hint, hint, hint!)

    • There are many ways to customize the actual behavior of a piece of code. The choice here is heavily influenced by other criteria (size of the resulting code, maintainability, generality, protability, etc.). I've used many schemes, including passing a closure to a function.

Re: Re-Factoring: Philosophy and Approach
by dws (Chancellor) on Nov 02, 2001 at 23:32 UTC
    Of course, I'd love to have comments on individual bits of the code - but ...

    You're in luck! :)

    Error handling, error handling, error handling. In particular, use {RaiseError = 1} when you're connecting to the database to tell DBI to check and report errors for you. That alone can save lots of grief.

    You can refactor some of the database calls, by preparing them once, then binding/executing when you need to invoke them. See (Ovid) Re: DBI placeholders for hints.

Re: Re-Factoring: Philosophy and Approach
by mpeppler (Vicar) on Nov 02, 2001 at 23:43 UTC
    Just a few thoughts regarding the DBI code...

    Try to create logical database requests, wrap those in perl subroutines, and place them in a separate module. This will greatly improve the readability of the main script, and will also improve maintainability.

    Avoid "select * ..." SQL statements. Be explicit in what you are querying, even if that makes the code more verbose - it'll again improve maintainability.

    One thing that is repeating a lot in the code is prepare/execute/fetch one row. Maybe you could use a utility subroutine that does that, something like:

    sub fetch_one_row { my $sql = shift; my $sth = $dbh->prepare($sql); $sth->execute; $sth->fetchrow_hashref; }
    Just using that could already simplify the code to a great extent.


Re: Re-Factoring: Philosophy and Approach
by poqui (Deacon) on Nov 03, 2001 at 00:29 UTC
    After reading the Link from Albannach and musing over it all some more, I have to concur with others that Functions and Subroutines are good; but I had to think back awhile to remember why.
    Others have covered most reasons, and another that comes to me is: modularity makes spreading out the work on a large project much easier.

    I started out programming in Basic and fortran, and my code was all spaghetti or straight top down. But I never modified the code, or even spent very much time debugging it. Later, the structured approach was the thing, and that brought along with it modularity. The next step was Object Orientation, which takes modules and makes them entities on their own.

    But I guess my point is, that making subroutines is really only useful if you find it so. For me, it became useful when my code got too big to grok in 1 chunk, then I broke it down into more grokkable chunks. Once it is starting to break into logical chunks, I would go all the way and have a lean control or main module that runs all the little subroutinesm, with pithy names. Then, if I find myself using a subroutine in several places with a only a slight variation, I look at making a function out of it. Its only one more step from function to its own program or module.

    "That that is, is... for what is that but that? and is but is?" Shakespeare, Twelfth Night, Act IV, Scene 2

    "Yet THAT which is not neither is nor is not That which is!" Frater Perdurabo (pseud. Aleister Crowley), Liber CCCXXXIII, The Book of Lies
Re: Re-Factoring: Philosophy and Approach
by FoxtrotUniform (Prior) on Nov 03, 2001 at 01:03 UTC

    In the inevitable compromises involved in reducing several different activities into a uniform format, what do you trade off for what?

    In general, I trade off generality for simplicity and maintainability. I try to make my code easy to read, easy to maintain, and orthogonal rather than easy to write (the first time...), but that doesn't happen as often as I'd like.

    Where do you start in deciding what goes in which subroutine?

    Subroutines should be orthogonal: as self-contained as possible. I try to take the approach that if someone just wants this one subroutine, they should be able to use just this one subroutine, rather than a bunch of other subs that it depends on for setup, data extraction, etc. (Not to say that subroutines should be entirely independent of each other -- obviously you'll want to use sub foo in sub bar rather than duplicating foo's code -- but from the user's perspective, only one call should have to be made.) This way, changes to the guts of a subroutine are as local as possible to that sub (and you don't have to grovel all over your code looking for places you might have affected).

    Do you use one module or several?

    Often none. If I find myself doing the same thing for a couple of different scripts, I'll pull the stuff out and make a module, but I tend to be lazy about code reuse: instead of packaging something up with the idea that I'll use it later, I wait until I need something that I've already written before breaking it out into a module.

    One thing that I've found about modules, at least in my experience, is that it's easy to write tightly coupled code: after all, you have your own namespace, or someone's passing around a nice self-contained object, so it's not such a big deal to share some function between two or three subs, is it? Then, six months later, you try to fix something, and you fix it in one place instead of two or three... you know the drill.

    What's the optimum size for a subroutine?

    Five tons of flax.

    Seriously, this doesn't have a single meaningful answer. I like to keep breaking up a sub until all of the nontrivial bits are in their own (properly named) functions. Anything longer than, say, forty or fifty lines is probably in desperate need of refactoring.

    If you've got one thing you do over and over again in two slightly different ways, do you want one subroutine with an if or two subroutines?

    If I'm doing something similar over and over again, in two slightly different ways, I'll probably want to do it in a third way later on. I try to break out the "same"-ness into an algorithm, and modify its behaviour with a parameter or two. This isn't easy, of course.

    And anything else?

    Read, read code, code.

Re: Re-Factoring: Philosophy and Approach
by Biker (Priest) on Nov 03, 2001 at 22:06 UTC

    I think you're showing another example of a problem we're all fighting with now and then.

    It is all to easy to start coding without having a design.

    Especially when there is some 'project manager' running behind your back, constantly asking for something 'he can show'. I.e. some kind of prototype.

    Most(?) people tend to share my experience from learning OOP (Object Oriented Programming). What I learnt was that OOP has little importance compared to OOD (Object Oriented Design).

    Once OOD is more or less mastered and (most importantly) used, the developer often comes to the conclusion that "when using OOD, OOP becomes nothing more than a question of syntax".

    f--k the world!!!!
    /dev/world has reached maximal mount count, check forced.

      This really chimes in with what I was reading today in Code Complete - that the difference between OO and functional programming is not in what you actually produce in the end, but in how you conceptualise it to start with. I might write a programme with fifty functions and you might write a programme with seven objects and they'd basically be the same thing because your seven objects have seven functions each, and my fifty functions logically group themselves in seven groups. They're just different ways of looking at the thing. But of course there's no "just" about it. One might as well say a radio telescope and an optical telescope were "just" different ways of looking at a star, or taste and smell were "just" different ways of experiencing food. One or the other is good; but both together give one a more rounded picture.

      Hmm, this has been a most philosophically stimulating weekend so far :)

      George Sherston

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://122879]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2014-04-16 14:10 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (427 votes), past polls