Is validation of all subroutine arguments overkill?

nysus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is validation of all subroutine arguments overkill? by haukex (Archbishop) on May 21, 2016 at 08:04 UTC
Hi nysus, I know the feeling exactly :-) I think the only good answer is "it depends", so I'd suggest you try to get a broad spectrum of input of how others do it, and gain your own experience. Here's some of my thoughts about whether parameters should be validated and whether to test that: What kind of failure can be expected if the user passes a wrong parameter? E.g. parameters that cause files to be edited or even deleted should probably be checked in detail, but if e.g. it's a value that would cause Perl to issue a warning message or make an incorrect calculation, maybe the user of your code needs to track that issue down on their own. Who is the intended audience? How user friendly do you want the code to be? How user friendly does the code need to be? Also: after gaining some experience with the API, which parameters do the users get wrong most often? What metric are you testing by? For example, even 90-100% code coverage doesn't necessarily require you to fiddle with every argument. Personally, I rarely check the content of error messages my code generates. Sometimes, when my code generates a custom warning for example, will I check that it does get generated. Consider what you're testing for: If you use e.g. Params::Validate, and that module is well-tested, are your tests just double-checking that it works? Hope this helps, -- Hauke D Update: Minor rewording & reorg. and added the last item to the list.	[reply]
Re: Is validation of all subroutine arguments overkill? by BrowserUk (Patriarch) on May 21, 2016 at 08:18 UTC
Do you think it's worth the time to write tests to ensure subroutines handle bad arguments? What is the likelihood of a particular subroutine being passed bad arguments? If those arguments are sourced from data outside of your programs control, and that data will not have passed any other validation before being passed to this subroutine, then validating is not just worth the time, but (IMO) imperative. On the other hand if the subroutine can only be called with data that has already been validated, its pointless. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Is validation of all subroutine arguments overkill? by Laurent_R (Canon) on May 21, 2016 at 09:50 UTC
IMHO, you certainly need to validate any data coming from an external or uncontrolled source (user input, input file, database, network socket, etc.). But there is no point checking things that cannot happen. If you know for sure that you have populated an array with some known data and pass that array as an argument to a subroutine, there is usually no point for the subroutine to check whether the array is not empty or has the right data. This may sound obvious, but there is a lot of code out there double checking or triple checking things. Overly defensive programming can be a source of bugs, increased maintenance costs and performance degradation. Having said that, if there is a chance that the subroutine gets called from somewhere else in the code in the future, it might still be desirable to make sure your arguments really contain what they are supposed to contain. Especially, if your routine is part of a module that may be called by another program, you certainly need to check the validity of the parameters, because you usually can't rely on your module user to do the right thing. The next question is what you should do if they are not valid (should you die, raise an exception, return undef, etc.), and it is sometimes not easy to determine the best course of action in such cases. In brief, there is no definite answer, it depends often on the actual situation. And there are often some trade-offs. You probably have to use your good judgment. But all this is just my personal opinion.	[reply]
Re: Is validation of all subroutine arguments overkill? by stevieb (Canon) on May 21, 2016 at 13:48 UTC
"Though I'm sure it makes my code is more robust, I'm wondering at what cost." The cost is time and extra effort, but the payoff is easier to troubleshoot, more robust and easier to update code. Not only that, but it's experience for later projects; you get a very good understanding and knowledge of when you need to do deep validation, and when only a light skim will do. Not only that, but with good validation, your test suite can be (sometimes slightly) simpler... less edge cases to test against. You can `for()` loop a boatload of bad params at a sub in a test, and get back the same error/warning etc. if you've validated/checked them in the actual code. Get familiar with `dies_ok()` and `throws_ok()` from Test::Exception, and tricks to catch warnings, eg: `{ my $warning; local $SIG{__WARN__} = sub { $warning = shift; }; Module::do_something($param, $param1); like ($warning, qr/ignoring/, 'param 1 isn't an href, ignored'); }` [download] Keep up the good work. The extra effort put in up front will pay off later when writing new projects, and when you come back to existing ones to make updates/fix bugs etc. In my larger distributions, I typically use a named param public interface: `sub blah { my %params = @_; ... }` [download] ...called like: `my $farm = blah(moo => $href, meow => "Phoebe's smelly cat");` [download] ...along with some sort of `_validate_params()` function/method, and sometimes a public function so a user can fetch all the valid params if desired. Here's a really basic example. `sub valid_params { my %valid = ( moo => 1, woof => 1, meow => 1, ); return %valid; } sub _validate_params { my %p = @_; my %valid = valid_params(); for (keys %p){ die "$_ isn't a valid param" if ! exists $valid{$_}; } if (exists $p{moo} && ref $p{moo} ne 'HASH'){ die "param 'moo' must be a hash reference"; } }` [download] Then in each sub that accepts params: `sub yell { my %p = @_; _validate_params(%p); ... }` [download] Also keep in mind that the CPAN has some good modules for specific validations. Email::Valid for email addresses, Data::Validate::IP for v4/v6 IP addresses etc. Some of these validation distributions have been written by Perl heavy-hitters (the two I've mentioned certainly have), so if you've got a common type of param, do a quick search to see if someone has already written a validator for it. Sometimes the extra weight of an external module is worth it, sometimes it's not, but it's good to know these things are available if desired.	[reply] [d/l] [select]
Re^2: Is validation of all subroutine arguments overkill? by haukex (Archbishop) on May 21, 2016 at 14:14 UTC
Hi stevieb, Testing for warnings is a good idea, and ++ for the rest of your post, I just wanted to point out that I got burned by code similar to what you posted: You can't know how many warnings a piece of code will produce, especially over various Perl versions. Example: this CPAN Testers failure is caused by my test code expecting an exact number of warnings, but Perl 5.6 produced an extra warning in this case. Here's the code I currently use (in addition to Test::Fatal): `sub warns (&) { my $sub = shift; my @warns; { local $SIG{__WARN__} = sub { push @warns, shift }; $sub->() } return @warns; } # example usage use Test::More tests=>2; my @w = warns { is some_func(), "bar"; }; is grep({/\bfoo\b/} @w), 1; sub some_func { # could be any number of other warnings here warn "foo"; return "bar"; }` [download] Regards, -- Hauke D	[reply] [d/l]
Re^3: Is validation of all subroutine arguments overkill? by stevieb (Canon) on May 21, 2016 at 14:32 UTC
Yes, thanks for pointing that out, and in fact I usually do implement it with `push()` to an array. I'm in slow-mo this morning as it's a dreary Saturday to start a long weekend :) I'll keep my example as is so that future readers have context for your post, and can see the potential pitfall.	[reply] [d/l]
Re^2: Is validation of all subroutine arguments overkill? by RonW (Parson) on May 23, 2016 at 21:43 UTC
less edge cases to test against Maybe Testing the boundaries checked by the parameter validation is a form of edge case testing that should be done. For whatever reasonable value of epsilon, you need to validate that values just "inside" the boundaries are accepted and those just "outside" are rejected.	[reply]
Re: Is validation of all subroutine arguments overkill? by davies (Prior) on May 21, 2016 at 10:24 UTC
While supporting the "it depends" philosophy the previous respondents have suggested, I'd strongly recommend Method::Signatures (http://search.cpan.org/~barefoot/Method-Signatures-20160315/lib/Method/Signatures.pm). This allows you to set type constraints (http://search.cpan.org/~barefoot/Method-Signatures-20160315/lib/Method/Signatures.pm#Type_Constraints) and value constraints (next section) as well as its other benefits. Regards, John Davies	[reply]
Re: Is validation of all subroutine arguments overkill? by Anonymous Monk on May 21, 2016 at 11:17 UTC
You can always use Perl taint to be your overly paranoid variable checker. Now a days not a lot of people use taint or even remember what it does, but it is definitely going to show you what is possibly tainted data. It is also a very good tool to learn where tainted data can come from. Then all you need to do is learn regular expressions to filter or detect the issue. With most variable validating modules you will still have to know what you want to filter and what the data format is.	[reply]