Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

"undef" is not NULL and what to do about it

by Ovid (Cardinal)
on Feb 20, 2013 at 16:31 UTC ( #1019806=perlmeditation: print w/ replies, xml ) Need Help??

What's wrong with undef?

Currently undef has three different coercions: false, zero, or the empty string. Sometimes those are correct, but not always. Further, by design, it doesn't always emit warnings:

$ perl -Mstrict -Mwarnings -E 'my $foo; say ++$foo' 1 $ perl -Mstrict -Mwarnings -E 'my $foo; say $foo + 1' Use of uninitialized value $foo in addition (+) at -e line 1. 1

And because it has no precise definition, undef might mean any of a number of things:

  • The value's not applicable
  • It's not known
  • It's not available
  • It's restricted
  • Something else?

In other words, the behavior of undef is overloaded, its meaning is ambiguous and you are not guaranteed to have warnings if you use it incorrectly.

Now think about SQL's NULL value. It's problematic, but no alternative has taken hold for simple reason: its meaning is clear and its behavior is unambiguous. It states quite clearly that 'if I don't have a value, I will treat that value as "unknown" via a set of well-defined rules'.

So, between the ambiguity of Perl's "undef" and the clarity of SQL's NULL, it's probably not surprising that some code could use the latter instead of the former. In fact, I will go further as to assert that the code is often clearer if you do that. Case in point:

foreach my $employee (@employees) { if ( $employee->salary < $threshold ) { increase_salary( $employee, 3_000 ); } }

I'm sure plenty of you see the bug: what if the employee doesn't have a salary? I've already outlined several reasons above for why the salary might be undefined, not all of which mean "this employee has no salary". So if salary is undefined, you will probably get a warning and you'll have a bug. If threshold is undefined you'll get a warning but you won't have a bug. Fun, eh? So you might rewrite the code like this:

if ( defined $threshold ) { foreach my $employee (@employees) { my $salary = $employee->salary; next unless defined $salary; if ( $salary < $threshold ) { increase_salary( $employee, 3_000 ); } } }

Now that's starting to get ugly. Wouldn't it be if we had simple, clear NULL semantics instead? Now you can, with Unknown::Variables. I explain this idea at length at blogs.perl.org, but it works like this: instead of using "undef", you use "unknown".

use Unknown::Variables; my $value = unknown; my @array = ( 1, 2, 3, $value, 4, 5 ); my @less    = grep { $_ < 4 } @array;   # assigns (1,2,3) my @greater = grep { $_ > 3 } @array;   # assigns (4,5)

There's a lot more to this module than meets the eye, but the idea is that the unknown keyword (well, it acts like a keyword), unlike undef, has clear, well-defined semantics and and predictable behavior.

In the employee salary example above, if an unknown salary returns unknown instead of undef, our code goes back to the original:

foreach my $employee (@employees) { if ( $employee->salary < $threshold ) { increase_salary( $employee, 3_000 ); } }

Further, you won't generate any warnings because warnings are there to warn you when you're doing something bad. In this case, unknown values are designed to be compared an behave appropriately. And as you can see, our code above becomes much simpler. When used appropriately, unknown values can let your code focus on the problem rather than work around fiddly bits of the language. Any place that you find yourself assigning undef in your code, see what would happen if you put an unknown value there instead.

Feedback welcome. Download the code from the CPAN or, if you want to hack on the code, clone it on github.

Comment on "undef" is not NULL and what to do about it
Select or Download Code
Re: "undef" is not NULL and what to do about it
by salva (Monsignor) on Feb 20, 2013 at 17:22 UTC
    What happens if your code is as follows:
    if ( $salary < $threshold ) { increase_salary( $employee, 3_000); } else { decrease_salary( $employee, 3_000); }
    Or just...
    unless ($salary > $threshold) { increase_salary( $employee, 3_000); }

    There is no right default way to handle undefs. If you expect undefs on your data, the best you can do is to test for them explicitly.

      Those are both great examples and show why unkown is better than undef. First, keep in mind that undef behavior is rather ad-hoc and poorly specified and the warnings reveal that it's not designed for comparison. However, unknown is specifically designed for comparison and nothing else and it's behavior is well documented in my module.

      In this case, either of your examples would be a bug if you're dealing with either undef or unknown values. However, unknown values offers safety. Let's look at your first example:

      if ( $salary < $threshold ) { increase_salary( $employee, 3_000); } else { decrease_salary( $employee, 3_000); }

      What might reasonably happen if we have an undef value? The salary is coerced to zero and that's probably less than the threshold, thus causing increase_salary() to be called. What happens in there? Presumably something like this:

          $employee->salary( $employee->salary + $increase );

      And an employee who's salary was previously unknown now has a salary of $3,000 and with undef values, you've probably corrupted your data.

      What happens if you use unknown? Well, salary does not evaluate as less than threshold, so we call decrease_salary() and probably still have a bug, right? In that function, we probably hit code like this:

          $employee->salary( $employee->salary - $decrease );

      So did we corrupt our data? Nope. Remember, unknown values are designed to provide semantically correct comparisons and nothing else. What happens if you try to do something else? For the example above, you see Math cannot be performed on unknown values followed by a stack trace.

      In other words, unknown values will throw an exception rather than allow your data to be corrupted.

      So should you test for unknowns? Sure. The module exports an is_unknown predicate (which defaults to $_). Using that liberally will help make your code more robust. However, if you forget (and which programmer doesn't forget from time to time?), undef can corrupt your data while unknown will die rather than allowing it to be corrupted. That's a deliberate design goal.

      The only case where I've violated this rule is stringification: it prints [unknown] for unknown values. However, this may have been a mistake (imagine printing this in JSON, for example) and I may revert that behavior in another release.

        we probably hit code like this

        Assuming that probably things are going to happen always is a recipe for disaster.

        Your implementation of decrease_salary seems plausible, but it is not the only one. In order to ensure that you are handling "unknown" values right you will be forced to examine it, and even then, this is not perfectly safe, because somebody may rewrite that function in the future without noticing that subtle dependency.

        IMO, the only unknown value that makes sense is one that croaks when used as part of any operation (including comparisons) and that has to be explicitly checked with is_unknown.

        Or you can just use...

        use warnings FATAL => 'uninitialized';

        though, this will not cross lexical scopes, so I can still see value on having a magic unknown value.

Re: "undef" is not NULL and what to do about it
by sundialsvc4 (Monsignor) on Feb 20, 2013 at 17:43 UTC

    I certainly would like to see good, clean semantics (in the context of Perl-5, which is actually here, as opposed to the maybe-someday Perl-6) that expresses at least the following two ideas:

    • NULL:   “it is known that there is the absence of a value.”
    • unknown:   “it is known that the value is unknown.”

    undef, as “the value of this variable has not yet been defined,” indeed has been over-applied ... it has been stretched into service to cover all of these cases because the Perl language (AFAIK) offers nothing else.

    What I do not know, of course, is what to do about it.   I do not delve into the world of perlguts.   But I endorse what you are saying here and would certainly like to find an improvement, operating within the context of Perl five.   (And, y’know, “if we can do Moose, we ought to be able to do this.”)

    The fly in the ointment, of course, will be compatibility.   Any new ideas, no matter how great they may be, must somehow be introduced into the present.   That means, in particular, DBI, which currently uses undef to represent NULL ... this, indeed, being the best design-choice that has been available so-far.   As code begins to appear on the scene which supports these new semantics that both you and I apparently dream-of, it must be compatible somehow.   It must, “Just Work.™”

      Howdy!

      NULL as used in relational databases still carries semantic ambiguity.

      It could indicate that the value is absent, or it could indicate that the value is unknown. Or it could mean either. If I recall correctly, Codd called for multiple flavors of NULL.

      yours,
      Michael
Re: "undef" is not NULL and what to do about it
by Tux (Monsignor) on Feb 20, 2013 at 17:49 UTC

    This is one of the reasons why we - in ALLL our database types - disallow NULL values. All fields should have (default) values.

    Also try to think about what will become of data when exported as CSV, which most modern databases support. CSV does not know the difference between 1,,2 and 1,"",2 (by default), so inserting exported data somewhere else might alter your data.

    This is also one of the major reasons why I thoroughly hate Oracle. If I insert an empty string into a varchar field in Oracle, it helpfully inserts a NULL instead. HATE HATE HATE. Also see here.

    I for one do not see perl's undef being wrong here, but the inconsistency in databases. OTOH I could see value in a new constant really being NULL.


    Enjoy, Have FUN! H.Merijn
      ++ for the link!

      This is one of the reasons why we - in ALLL our database types - disallow NULL values. All fields should have (default) values

      can't avoid the nulls produced by the outer joins though,i.e a left one

      If I insert an empty string into a varchar field in Oracle, it helpfully inserts a NULL instead

      I am not familiar with Oracle;how does it behave when when you put a not null constraint on the column?

      catering for nulls is such an important issue that for example C# introduced Nullable Types into the core language

      All fields should have (default) values.
      So how do you represent unknown values? By using a default "UNKNOWN"?

      Then I might just as well use a NULL...

      If I insert an empty string into a varchar field in Oracle, it helpfully inserts a NULL instead.
      VARCHAR-fields do in theory exist in Oracle but nobody uses them.

      You are talking about VARCHAR2-fields and for them an empty string IS a NULL. There is no difference at all. If you use that data-type you must be aware of this.

        an empty string IS a NULL. There is no difference at all.

        I think the point is you can not select the data using, e.g.WHERE colname = ''.

        So how do you represent unknown values? By using a default "UNKNOWN"?

        Sometimes you know what the value is, and it's blank. There's just too many ways to look at it.

        .

        "unknown" values are represented - in our case - by a value that falls outside the legal range. e.g. a for numeric values defined to be 0 or greater, we use -1 to be the "undefined" value and -2 to be the "known to be wrong or invalid" value. For strings that decision is taken on each case, some already have a value for "known to be unknown": . (not our choice, it is a law in that database) others might be a single space.

        varchar2 is Oracle-only, yet another reason to loathe Oracle. Indeed we use varchar2 instead of varchar, but they suck nevertheless.

        I just don't understand how you can say that an empty string equals to NULL. That is only true in the rotten Oracle world.

        FWIW I use a lot of database types (Oracle, Unify, MySQL, PostgreSQL, MariaDB, SQLite, CSV, TSV, MonetDB) and NONE is perfect. Oracle is not the only database with drawbacks.

        The is more btw, how to represent (in a database)

        • Known, valid ("normal" state in databases)
        • Known to be unknown (I checked it, but the value cannot be set)
        • Known to be absent/unavailable
        • Unknown ("NULL" in databases)
        • Known to be illegal (cannot use the illegal value, like date of birth 23770245)

        Enjoy, Have FUN! H.Merijn
        pmsig

      For a brief time 10 years ago I though like you do in this post. However after working with foreign keys and multi-key unique keys in well managed schemas, nulls have become a very welcome tool for helping to model relations via database constraints that could not be done without nulls.

      I share in your lack of enthusiasm for various oracle behaviors.

      Excluding NULL or undef values under a well intentioned blanket statement is fine, but understand you are removing valuable tools from yourself - cutting down on logical variations sure - but you are removing tools from your toolbox. Just because you haven't yet discovered how they are useful to other people doesn't make them less useful.

      my @a=qw(random brilliant braindead); print $a[rand(@a)];
      Howdy!

      That behaviour in Oracle is hateful and deviates from the ANSI standard, last I checked.

      yours,
      Michael
Re: "undef" is not NULL and what to do about it
by tmharish (Friar) on Feb 21, 2013 at 06:34 UTC
Re: "undef" is not NULL and what to do about it
by flexvault (Parson) on Feb 21, 2013 at 11:14 UTC

    Good Day Ovid,

    You stated:


      So, between the ambiguity of Perl's "undef" and the clarity of SQL's NULL, it's probably not surprising that some code could use the latter instead of the former. In fact, I will go further as to assert that the code is often clearer if you do that. Case in point:
      foreach my $employee (@employees) { if ( $employee->salary < $threshold ) { increase_salary( $employee, 3_000 ); } }
      I'm sure plenty of you see the bug: what if the employee doesn't have a salary? I've already outlined several reasons above for why the salary might be undefined, not all of which mean "this employee has no salary..."


    I'm confused by your example, and the code below shows why I'm confused. In memory the value for 'salary' is undefined, but once it written to disk, whenever it is accessed after the initial 'write', the field is defined. So if this is a quirk of the DB you're using, why expect Perl to fix it? Everyone else seems to understand you're problem, but it seems related to the DB you're using and not to a specific Perl problem.

    use strict; use warnings; my %Undef = ( Show => 'okay', Go => 'maybe', ); foreach my $key ( keys %Undef ) { if ( ! defined $Undef{$key} ) { print "1. $key value is undef +ined!\n"; } } $Undef{Salary} = undef; foreach my $key ( keys %Undef ) { if ( ! defined $Undef{$key} ) { print "2. $key value is undef +ined!\n"; } } open ( my $file, ">","./ondisk") || die " ! open |./ondisk| $!\n"; foreach my $key ( keys %Undef ) { print $file "$key\t$Undef{$key}\n"; ## line 20 } close $file; my %Defined; print "\n"; open ( $file, "<","./ondisk") || die " ! open |./ondisk| $!\n"; while ( my $value = <$file> ) { chomp $value; my ( $newkey, $newvalue ) = split(/\t/,$value); $Defined{$newkey} = $newvalue; } close $file; foreach my $key ( keys %Defined ) { if ( defined $Defined{$key} ) { print "$key\t|$Defined{$key}| +\n"; } } 1;
    Results:
    > perl unknown.plx 2. Salary value is undefined! Use of uninitialized value $Undef{"Salary"} in concatenation (.) or st +ring at unknown.plx line 20. Go |maybe| Salary || Show |okay|

    Now if this is to solve a DB related flaw, then I see your point. But I don't understand how 'unknown' is better then 'undef/defined' in Perl scripts. YMMV!

    Maybe the title should have SQL in front of "NULL" to clarify the intent of the 'unknown'.

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin

Re: "undef" is not NULL and what to do about it
by Rhandom (Curate) on Feb 21, 2013 at 16:04 UTC
    "undef" is not NULL and what to do about it

    Sometimes undef is null, sometimes undef isn't. Adding "unknown" doesn't constrain these situations any better than using undef deos. In any given case you either know how to handle undef or you don't. If you know how to handle it, you handle it. Sometimes it very well could be the right thing to do is add 3000 (maybe that is what initial salaries are - though doubtful). Most of the time, you write your code to handle exception states.

    The addition of unknown as a value type is fine (I guess). But it is only really an additional exception state. You could guess that unknown will behave more validly than undef, but I would wager that if you did that then most of the time you would just be guessing. Really, if your value is unknown, and you want your system to be well defined, then you have to handle unknown values and not blindly attempt to process them. This is no different than how you safe handle undef values.

    In all reality the new "unknown" is really just another flavor of undef and carries all of the same problems and benefits that undef does.

    Incidentally, the behavior you outlined for unknown appears to be very Perl6 junction-like and carries a little of the same benefit, but all of the same baggage that junctions do. Junctions in a very short scope are easy enough to follow, but at a distance there is grave magic.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: "undef" is not NULL and what to do about it
by BrowserUk (Pope) on Feb 21, 2013 at 16:30 UTC

    I kind of see the problem you perceive, though I'd much prefer to handle it with defined:

    foreach my $employee (@employees) { if ( defined $employee->salary and $employee->salary < $threshold +) { increase_salary( $employee, 3_000 ); } }

    Double reference to $employee->salary or not; I bet it is a) more efficient than your module; b) far less prone to misunderstanding or accidental disablement than your scary-action-at-a distance module.

    Personally; I think that instead of initialising things as unknown, you'd achieve a better effect by initialising them as #NaN>.

    Indeed, I can see the case for a pragma to cause undef to be treated as #NaN in numeric contexts.

    A lexically scoped use undefAsNaN; could be very useful I think.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: "undef" is not NULL and what to do about it
by moritz (Cardinal) on Feb 21, 2013 at 18:47 UTC

    IMHO the only sane way to deal with NULL/undef values is not to allow them by default, and if they are present, force explicit handling of them.

    Unfortunately the only language I know that gets this right is Haskell (which has its own share of problem).

Re: "undef" is not NULL and what to do about it
by dsheroh (Parson) on Feb 22, 2013 at 10:53 UTC
    So, between the ambiguity of Perl's "undef" and the clarity of SQL's NULL...
    You take an interesting position there. "Interesting" in that I've previously debated the merits of SQL's NULL with people who hold that it meaning is ambiguous in exactly the same ways that you say undef is ambiguous. (Does NULL mean that the value is unavailable? Unknown? Inapplicable? Something else?)

    Personally, I think that both undef and NULL are fine as they are. Even if you split them into a flock of distinct not-a-values, at least one not-a-value will always be a catch-all for cases that fall outside the set of explicit not-a-values, so the ambiguity will always exist. Yes, the use of explicitly-defined not-a-values would reduce the need to use ambiguous not-a-values, but that comes at the cost of complexity in remembering how the various not-a-values interact and deciding which one to use in any given situation. Far better, IMO, to keep the syntax simple with a single not-a-value, relying on the programmer to indicate its meaning in any given situation and determine the appropriate behavior based on that meaning.

      Ah, bugger. I should have been more clear. You're absolutely right that it's very unclear what a given NULL should mean. What I meant was that the behavior of NULL is very clear and predictable, regardless of meaning. There are rare cases where you get a false negative, but unlike undef, you generally don't get a false positive. That's a trade-off I prefer, though I can see that many others don't.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1019806]
Approved by marto
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2014-09-19 11:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (135 votes), past polls