Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Is validation of all subroutine arguments overkill?

by nysus (Parson)
on May 21, 2016 at 07:36 UTC ( [id://1163720]=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I've been learning test-driven development over the last couple of days. The process has gotten me to become much more careful with my code...but perhaps too careful.

I was always the type of programmer that rarely validated arguments passed to subroutines. If an argument was passed that was not the expected format, the program would behave unexpectedly or crash and then I'd deal with it. With test-driven development, I'm finding myself writing tests to ensure a subroutine returns the appropriate error message if a passed argument is not what the subroutine expects.

This is certainly a lot more work. Though I'm sure it makes my code is more robust, I'm wondering at what cost. I'm not writing code for banking institutions or space programs, after all.

So I'm wondering how other Monks approach argument validation. Is this something you do diligently and is considered to be best practice? Do you think it's worth the time to write tests to ensure subroutines handle bad arguments? Are you using modules like Param::Validate to check subroutine arguments? Any other thoughts on argument validation you'd like to share?

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Is validation of all subroutine arguments overkill?
by haukex (Archbishop) on May 21, 2016 at 08:04 UTC

    Hi nysus,

    I know the feeling exactly :-) I think the only good answer is "it depends", so I'd suggest you try to get a broad spectrum of input of how others do it, and gain your own experience.

    Here's some of my thoughts about whether parameters should be validated and whether to test that:

    • What kind of failure can be expected if the user passes a wrong parameter? E.g. parameters that cause files to be edited or even deleted should probably be checked in detail, but if e.g. it's a value that would cause Perl to issue a warning message or make an incorrect calculation, maybe the user of your code needs to track that issue down on their own.
    • Who is the intended audience? How user friendly do you want the code to be?
    • How user friendly does the code need to be? Also: after gaining some experience with the API, which parameters do the users get wrong most often?
    • What metric are you testing by? For example, even 90-100% code coverage doesn't necessarily require you to fiddle with every argument.
    • Personally, I rarely check the content of error messages my code generates. Sometimes, when my code generates a custom warning for example, will I check that it does get generated.
    • Consider what you're testing for: If you use e.g. Params::Validate, and that module is well-tested, are your tests just double-checking that it works?

    Hope this helps,
    -- Hauke D

    Update: Minor rewording & reorg. and added the last item to the list.

Re: Is validation of all subroutine arguments overkill?
by BrowserUk (Patriarch) on May 21, 2016 at 08:18 UTC
    Do you think it's worth the time to write tests to ensure subroutines handle bad arguments?

    What is the likelihood of a particular subroutine being passed bad arguments?

    If those arguments are sourced from data outside of your programs control, and that data will not have passed any other validation before being passed to this subroutine, then validating is not just worth the time, but (IMO) imperative.

    On the other hand if the subroutine can only be called with data that has already been validated, its pointless.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is validation of all subroutine arguments overkill?
by Laurent_R (Canon) on May 21, 2016 at 09:50 UTC
    IMHO, you certainly need to validate any data coming from an external or uncontrolled source (user input, input file, database, network socket, etc.).

    But there is no point checking things that cannot happen. If you know for sure that you have populated an array with some known data and pass that array as an argument to a subroutine, there is usually no point for the subroutine to check whether the array is not empty or has the right data. This may sound obvious, but there is a lot of code out there double checking or triple checking things. Overly defensive programming can be a source of bugs, increased maintenance costs and performance degradation.

    Having said that, if there is a chance that the subroutine gets called from somewhere else in the code in the future, it might still be desirable to make sure your arguments really contain what they are supposed to contain. Especially, if your routine is part of a module that may be called by another program, you certainly need to check the validity of the parameters, because you usually can't rely on your module user to do the right thing. The next question is what you should do if they are not valid (should you die, raise an exception, return undef, etc.), and it is sometimes not easy to determine the best course of action in such cases.

    In brief, there is no definite answer, it depends often on the actual situation. And there are often some trade-offs. You probably have to use your good judgment.

    But all this is just my personal opinion.

Re: Is validation of all subroutine arguments overkill?
by stevieb (Canon) on May 21, 2016 at 13:48 UTC
    "Though I'm sure it makes my code is more robust, I'm wondering at what cost."

    The cost is time and extra effort, but the payoff is easier to troubleshoot, more robust and easier to update code.

    Not only that, but it's experience for later projects; you get a very good understanding and knowledge of when you need to do deep validation, and when only a light skim will do. Not only that, but with good validation, your test suite can be (sometimes slightly) simpler... less edge cases to test against. You can for() loop a boatload of bad params at a sub in a test, and get back the same error/warning etc. if you've validated/checked them in the actual code. Get familiar with dies_ok() and throws_ok() from Test::Exception, and tricks to catch warnings, eg:

    { my $warning; local $SIG{__WARN__} = sub { $warning = shift; }; Module::do_something($param, $param1); like ($warning, qr/ignoring/, 'param 1 isn't an href, ignored'); }

    Keep up the good work. The extra effort put in up front will pay off later when writing new projects, and when you come back to existing ones to make updates/fix bugs etc.

    In my larger distributions, I typically use a named param public interface:

    sub blah { my %params = @_; ... }

    ...called like:

    my $farm = blah(moo => $href, meow => "Phoebe's smelly cat");

    ...along with some sort of _validate_params() function/method, and sometimes a public function so a user can fetch all the valid params if desired. Here's a really basic example.

    sub valid_params { my %valid = ( moo => 1, woof => 1, meow => 1, ); return %valid; } sub _validate_params { my %p = @_; my %valid = valid_params(); for (keys %p){ die "$_ isn't a valid param" if ! exists $valid{$_}; } if (exists $p{moo} && ref $p{moo} ne 'HASH'){ die "param 'moo' must be a hash reference"; } }

    Then in each sub that accepts params:

    sub yell { my %p = @_; _validate_params(%p); ... }

    Also keep in mind that the CPAN has some good modules for specific validations. Email::Valid for email addresses, Data::Validate::IP for v4/v6 IP addresses etc. Some of these validation distributions have been written by Perl heavy-hitters (the two I've mentioned certainly have), so if you've got a common type of param, do a quick search to see if someone has already written a validator for it. Sometimes the extra weight of an external module is worth it, sometimes it's not, but it's good to know these things are available if desired.

      Hi stevieb,

      Testing for warnings is a good idea, and ++ for the rest of your post, I just wanted to point out that I got burned by code similar to what you posted: You can't know how many warnings a piece of code will produce, especially over various Perl versions. Example: this CPAN Testers failure is caused by my test code expecting an exact number of warnings, but Perl 5.6 produced an extra warning in this case. Here's the code I currently use (in addition to Test::Fatal):

      sub warns (&) { my $sub = shift; my @warns; { local $SIG{__WARN__} = sub { push @warns, shift }; $sub->() } return @warns; } # example usage use Test::More tests=>2; my @w = warns { is some_func(), "bar"; }; is grep({/\bfoo\b/} @w), 1; sub some_func { # could be any number of other warnings here warn "foo"; return "bar"; }

      Regards,
      -- Hauke D

        Yes, thanks for pointing that out, and in fact I usually do implement it with push() to an array. I'm in slow-mo this morning as it's a dreary Saturday to start a long weekend :)

        I'll keep my example as is so that future readers have context for your post, and can see the potential pitfall.

      less edge cases to test against

      Maybe

      Testing the boundaries checked by the parameter validation is a form of edge case testing that should be done. For whatever reasonable value of epsilon, you need to validate that values just "inside" the boundaries are accepted and those just "outside" are rejected.

Re: Is validation of all subroutine arguments overkill?
by davies (Prior) on May 21, 2016 at 10:24 UTC
Re: Is validation of all subroutine arguments overkill?
by Anonymous Monk on May 21, 2016 at 11:17 UTC
    You can always use Perl taint to be your overly paranoid variable checker. Now a days not a lot of people use taint or even remember what it does, but it is definitely going to show you what is possibly tainted data. It is also a very good tool to learn where tainted data can come from.

    Then all you need to do is learn regular expressions to filter or detect the issue.

    With most variable validating modules you will still have to know what you want to filter and what the data format is.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1163720]
Approved by Athanasius
Front-paged by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-23 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found