http://www.perlmonks.org?node_id=1019843


in reply to Re: "undef" is not NULL and what to do about it
in thread "undef" is not NULL and what to do about it

Those are both great examples and show why unkown is better than undef. First, keep in mind that undef behavior is rather ad-hoc and poorly specified and the warnings reveal that it's not designed for comparison. However, unknown is specifically designed for comparison and nothing else and it's behavior is well documented in my module.

In this case, either of your examples would be a bug if you're dealing with either undef or unknown values. However, unknown values offers safety. Let's look at your first example:

if ( $salary < $threshold ) { increase_salary( $employee, 3_000); } else { decrease_salary( $employee, 3_000); }

What might reasonably happen if we have an undef value? The salary is coerced to zero and that's probably less than the threshold, thus causing increase_salary() to be called. What happens in there? Presumably something like this:

    $employee->salary( $employee->salary + $increase );

And an employee who's salary was previously unknown now has a salary of $3,000 and with undef values, you've probably corrupted your data.

What happens if you use unknown? Well, salary does not evaluate as less than threshold, so we call decrease_salary() and probably still have a bug, right? In that function, we probably hit code like this:

    $employee->salary( $employee->salary - $decrease );

So did we corrupt our data? Nope. Remember, unknown values are designed to provide semantically correct comparisons and nothing else. What happens if you try to do something else? For the example above, you see Math cannot be performed on unknown values followed by a stack trace.

In other words, unknown values will throw an exception rather than allow your data to be corrupted.

So should you test for unknowns? Sure. The module exports an is_unknown predicate (which defaults to $_). Using that liberally will help make your code more robust. However, if you forget (and which programmer doesn't forget from time to time?), undef can corrupt your data while unknown will die rather than allowing it to be corrupted. That's a deliberate design goal.

The only case where I've violated this rule is stringification: it prints [unknown] for unknown values. However, this may have been a mistake (imagine printing this in JSON, for example) and I may revert that behavior in another release.

Replies are listed 'Best First'.
Re^3: "undef" is not NULL and what to do about it
by salva (Canon) on Feb 21, 2013 at 09:47 UTC
    we probably hit code like this

    Assuming that probably things are going to happen always is a recipe for disaster.

    Your implementation of decrease_salary seems plausible, but it is not the only one. In order to ensure that you are handling "unknown" values right you will be forced to examine it, and even then, this is not perfectly safe, because somebody may rewrite that function in the future without noticing that subtle dependency.

    IMO, the only unknown value that makes sense is one that croaks when used as part of any operation (including comparisons) and that has to be explicitly checked with is_unknown.

    Or you can just use...

    use warnings FATAL => 'uninitialized';

    though, this will not cross lexical scopes, so I can still see value on having a magic unknown value.

        It occurs to me that there is another way to attack this problem, because, well, as you say in your blogs.perl.org post, you don't want the app to die after several hours just because some unknown value has slipped in.

        My idea is to imitate the taint approach. In the place where data is generated (i.e., DBI) it is flagged as maybe_unknown, then when it is used in other parts of the program the app will croak unless the flag had been removed with some check as is_unknown.

        Also, this feature should be easyly disabled, so that it can be used in development but turned off in production environments. For instance:

        sub data_generator { return maybe_unknown(@real_data); # attach maybe_unknown flag } sub data_processor { for my $data (@_) { if ($data =~ /foo/) { # in development this line croaks # because $data may be unknown. # Note that it will croak even when $data # is not actually 'unknown'! ... } unless (is_unknown($data)) { # clears the maybe_unknown flag if ($data =~ /foo/) { # does not croak because the maybe_unknown # flag has already been removed from $data ... } } } sub main { data_processor(data_generator); }