Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Writing general code: real world example - and doubts!

by polettix (Vicar)
on Apr 04, 2005 at 16:06 UTC ( #444727=perlmeditation: print w/replies, xml ) Need Help??

Generalisation. Refactoring. Reuse. Craig Larman suggests to "pick your battles", but I often have difficulties, also because Perl gives me tools to make powerful things in a few keystrokes, so I ask myself: "Why not"? So I begin...

Suppose you need a simple filter function, e.g. to detaint variables that you've already checked to be good. I've come up with something similar:

sub detaint_1 { $[0] =~ /^(.*)$/; return $1; }
Writing such filter functions, I often wonder whether I should provide an in-place modification version; for this particular example, it would be useful to have deparsing happening in-place, because the variable is actually trustable (assumed that due checks have been done), so I come up with the following:
sub detaint_1_inp { $[0] =~ /^(.*)$/; $[0] = $1; }
This has the added advantage to avoid variable copy, which could be expensive if they're big chunks of texts. In-place modification gives the user the ability to choose: if she wants a copy, just copy and call the function on the copy; otherwise, just apply the function to the original variable.

But what if I'll need to detaint more variables at once, for example after having performed cross-checks on them (think of two variables, one holding a directory name and another a file name in the directory)? I can easily extend the approaches:

sub detaint_n { map { /^(.*)$/; $1 } @_; } sub detaint_n_inp { foreach (@_) { /^(.*)$/; $_ = $1} }
Obviously, these results could be obtained with the one-argument version as well (I swear that I read the reply from BrowserUK to my snipped here):
my @a; # ... set @a ... my @b = map { detaint_1 } @a; # Copy detaint_1_inp($_) foreach (@a); # In-place
but I wonder: wouldn't it be useful to provide all these behaviours at once with a unique interface? So, I finally came up with the following monster, taking advantage of the three different context I can call a function, i.e. list/array, scalar and void:
sub detaint { my $subref = sub { $_[0] =~ /^(.*)$/; $1 }; return map { &$subref($_) } @_ if (wantarray); # Copy for arrays return &$subref($_[0]) if (defined(wantarray)); # Copy for scalar $_ = &$subref($_) foreach (@_); # In-place return; }
which can be easily refactored to get a general filter applier for more functions:
sub apply_filter { my $subref = shift; return map { &$subref($_) } @_ if (wantarray); # Copy for arrays return &$subref($_[0]) if (defined(wantarray)); # Copy for scalar $_ = &$subref($_) foreach (@_); # In-place return; } sub detaint { return apply_filter(sub { $_[0] =~ /^(.*)$/; $1 }, @_); } # Refactored code leads to reuse! sub left_pad { my $padding = shift; my $minlength = length($minlength); return apply_filter(sub { my $v = shift; if (length($v) < $minlength) { ($padding . $v) =~ /(.{$minlength})$/; $v = $1; } $v; }, @_); }
Back to Planet Earth, my question is: do I really need all this? I'm quite a novice when it comes to this kind of forecasts, and I read that newbies often tend to over-generalise, solving problems that they could have but that they will actually never have. Is this kind of generalisation an example of this?

Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

Don't fool yourself.

Replies are listed 'Best First'.
Re: Writing general code: real world example - and doubts!
by brian_d_foy (Abbot) on Apr 04, 2005 at 20:45 UTC

    Untainting variables that you've already checked to be good? That's the battle you should be fighting because it's a recipe for disaster. Don't wuss out: if you want taint checking, don't subvert it by giving programmers a way to ignore it. :)

    The trick to any of these sorts of decisions is to look at the rest of the code and figure out which style it uses, then try to use the same model. If most of the interface changes arguments in place, do that. If most of the interface leaves arguments alone and returns the munged values, do that. Beyond that, figure out which one makes the problem easier to solve and the code easier to read ( the optimal combination of those!), and do that.

    Once you figure out what you want, clearly document it.

    But, if you're spending all your time worrying about little details like this, you're probably avoiding much bigger problems that you don't want to think about. Put something in place and move on. :)

    brian d foy <>

      ++ your observation on not giving an out on detainting.

      I've got a solution to detainting which I think is very well suited to refactoring old code. It can be retrofitted as a single chunk of initialisation code, and old code will automatically validate and detaint the rigged variable on every assignment or modification.

      use Tie::Constrained qw/detaint/; tie my $var, 'Tie::Constrained', sub { my $re = qr/whatever/; $_[0] =~ /$re/ and &detaint; };
      Later, when you say, $var = $tainted_thing;
      $var is validated and untainted, while $tainted_thing remains tainted and unmodified. The trick is that the &detaint call acts on an anonymous copy of $tainted_thing's value just before assignment to $var. (Tie::Constrained::detaint() is almost identical to frodo72's original detaint()).

      This does not address OP's excellent question about generalizing his detaint routine. I question the value of that as a general practice, but it may be useful for an application which must iterate over a bunch of similar-type values.

      After Compline,

        Meditation are good for meditating - so this thread is teaching me something about refactoring (which is theoretical), and something about tainting (which is practical). This is good for me.

        I like very much your solution. In particular, it makes me reflect that an in-place detainting could be dangerous, because I could end up not knowing whether a variable has already been detainted or not. Your approach seems to divide variables into two groups: tainted and not, and this seems reasonable.

        I only wonder how readability could be affected by this approach. The detainting method being under the surface, I fear that the naive programmer (er.. me) could be disorientated by seeing such an assignment and seeing $var used in potentially dieing places.

        Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

        Don't fool yourself.
      Well - quite a bad example to show my doubts, I should have sticked to the more "quiet" left_pad in the first time :)

      This demonstrates that there's always something to learn, particularly where you don't suspect it. I wrote the detaint function to "free" some directory and file names after verifying they're real dir and files with "-X" tests, so I was confident they were right and I felt comfortable about them. And I erroneously concentrated on the wrong problem - or better, I had nothing better to do than "Meditate" at that moment.

      But what I'll keep in mind is that whenever I strive for generality and reusability letting myself lose time, things could (and will hopefully be) used in contexts quite different from the ones I've developed them. So wondering how to make a detaint function the most general one is very similar to a Bad Thing, because encourages potentially bad coding practices (like not verifying your data).

      Once you figure out what you want, clearly document it.
      I hope this will become natural for me; I really admire the fact that this thing has become such a habit for a recognised Guru to forget to say "before you code", which I read nonetheless. I'm a naive programmer, and I tend to document things after I've done them, but I understand that writing documentation first will help me fixating ideas and stick to one solution instead of refactoring brainlessly.

      Just to answer myself a bit regarding the in-place modification avoiding copying large chunks of data, I remembered an old motto: A premature optimisation is a Bad Thing. For what old can mean in CS :)

      Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

      Don't fool yourself.
Re: Writing general code: real world example - and doubts!
by chromatic (Archbishop) on Apr 04, 2005 at 17:34 UTC
    my question is: do I really need all this?

    That's a good question! What seems like the most natural way for the code that uses this code to work?

    The best way I've found to design an interface is to use it for a while, then to think really hard about what's easy and what's difficult and what might be clearer to read and to understand and to discourage bugs.

    I suspect that this code is more complicated than it needs to be, and I suspect that if you went with your first inclination and stayed open to modifying it a bit as you use it more, you'll find the right design.

      Not only your suggestion seems reasonable (Keep It Simple, Stupid is illuminating on the top of my current PerlMonks page!), but it also seems a good practice given all the answers below. I would also thank you for your practical advices - I'll try to solve problems in the most straightforward way, avoiding evident and immediate in-flexibilities and trying to remain flexible myself to future changes.

      About refactoring, I maybe missed the correct word - as a matter of fact, I was trying to pre-refactor just to be sure not to have to do later. I had to wait the moment in which this would have been really necessary!

      Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

      Don't fool yourself.
        I had to wait the moment in which this would have been really necessary!

        I think that's the key point to take away. The whole idea of refactoring is to make your code simpler. If your code is getting more complicated to understand something is probably going wrong :-)

        The time to refactor is when the code let's you know that refactoring is necessary (through duplication, hard to understand code, etc.)

Re: Writing general code: real world example - and doubts!
by Steve_p (Priest) on Apr 04, 2005 at 19:55 UTC

    Unlike some of the other advocates for refactoring, I don't think refactoring should be done just because I can. There should be concrete reasons for messing with production code other than, "This way looks cooler." A quick cost/benefit analysis should be done. It shouldn't require an accountant to do it, but you should be able to show that your refactorings will provide some sort of tangible benefit, either in terms of significantly improved performance or decreased maintanence costs.

    I do, however, have a few rules of thumb where I will refactor nearly every time. If the code is similar enough to other code where I can generalize (or is a cut and paste copy), I refactor into a generic module. If changes in other code will cause changes in the module I'm looking into, I refactor to break the dependency. If a change will make the code harder to test, I refactor to something that can be tested.

Re: Writing general code: real world example - and doubts!
by ysth (Canon) on Apr 04, 2005 at 20:53 UTC
    I have to caution you that all those detaint regexes also remove a trailing \n (and outright fail if there's a \n anywhere but at the end). Use the //s flag, and also train yourself to use \z instead of $ when you mean true end-of-string. (AIUI, both these problems are removed in perl6; . and $ can be used more naturally there.)
      This leads to one more meditation: should I craft my examples carefully before posting them, exercitating the Hubris quality, or should I write them deliberately "naive" so that all this bunch of da*ned good advices pop up? I'm starting to think that the latter is the right way - my examples are clearly in this direction :) - but I fear that only a small percentage of all these will actually penetrate into my brain. Better than nothing.

      This is also an indirect answer to my original topic: I think that I'll have to concentrate on solving the real problem (e.g. detainting a variable in the correct way, using the right regexes) before thinking about generalisations.

      Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

      Don't fool yourself.
Re: Writing general code: real world example - and doubts!
by tlm (Prior) on Apr 05, 2005 at 02:00 UTC

    Great question! I look forward to reading the replies you get, since I often find myself in a similar mess. It's hard to find the right balance between flexibility and simplicity, but I am making an effort to err on the side of simplicity, to focus primarily on something that can solve today's problems, while being mindful that these problems will evolve (i.e. don't hard-code every magic number, IP address, and pathname in your program, etc.). The trickiest are the interfaces, of course, because you are trying to anticipate interaction and that is bewildering. Here's where I'm being most emphatic about keeping things simple, offering fewer options to begin with, and refactoring up as needed. (Your discussion elsewhere of choosing between using map or making your subs capable of handling any number of elements is a good one; I'd start with the map option, and upgrade to the "listable" option if it turns out to be really useful.

    But as I said, I am as much in need of clues as you are, so take what I am saying with a bucket of salt.

    the lowliest monk

Re: Writing general code: real world example - and doubts!
by exussum0 (Vicar) on Apr 11, 2005 at 20:30 UTC
    I read through all the replies, and there's one pattern of data valdation I've always liked: rule based. You'd create an engine that you plugged in some sort of configuration space. Each rule would have a method to determine if the input passed is valid or not. If it is, a positive effect occurs. True is returned, an exception is not thrown.. something. On failure, false or an exception is created.

    For perl, I would imagine a system where everything must implement a base Rule object. The engine would create one of each rule and have some sort of "validate" function. Next, wrap whatever framework away from the developer. Your Wrapped framework would have a method for getting data, "get_validated_data" which would take a pointer to your validator configuration as to which object to use. I would then be able to do something ala...

    my $frameWork = WrapedFrameWork->new(); my $allValid = 1; my ( $valid, $data ) = $frameWork->get_validated_data("email","typeEma +il"); $allValid &= $valid; if( !$allValid ) { .... }
    If your developers do code reviews, which they should, any bypassing the system can be slapped down. New validators can be written for everyone to use. Life may be cleaner.. in an ideal world 8)

    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://444727]
Approved by Tanktalus
Front-paged by tlm
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2018-02-18 07:55 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (251 votes). Check out past polls.