Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: getting 2 things done at once with Class::Base (return undef)

by CharlesClarkson (Curate)
on Nov 02, 2007 at 08:33 UTC ( [id://648607]=note: print w/replies, xml ) Need Help??


in reply to Re^2: getting 2 things done at once with Class::Base (return undef)
in thread getting 2 things done at once with Class::Base

According to the docs (Perlfunc) on return:

If no EXPR is given, returns an empty list in list context, the undefined value in scalar context, and (of course) nothing at all in a void context.

I think this is mentioned in Perl Best Practices. I don't have a copy handy to verify it. So, the better practice seems to be to replace return undef; with return;.

A quick test shows that this works fine for boolean tests as well. We get the best of all worlds.

print "foo\n" unless foo(); sub foo { return; }
HTH,
Charles

Replies are listed 'Best First'.
Re^4: 'return undef' as a "best practice" ('2 things at once with Class::Base')
by tye (Sage) on Nov 02, 2007 at 17:16 UTC

    I know it is mentioned in Perl Best Practices. And I explained why it is the opposite of a "best practice" for some functions, but you appear to have completely ignored that explanation or completely not understood it. Note that return; and return(); are functionally identical. They are just two different ways of returning an empty list. I explained briefly above why there are cases1 where returning an empty list (no matter what syntax you use to do so) is not a good idea (that is, it is the opposite of a "best practice").

    1 And let me repeat that I agree with this idea of "returning an empty list to indicate failure" for functions that already return lists or (to be clearer for those who can't understand the concept of "return a list" vs "return a scalar") for functions that at least sometimes return a list of more than one scalar. So my argument is only that this "best practice" needs an additional condition applied to it (as well as that without this condition it becomes the opposite of a "best practice" in some cases).

    I have not read Perl Best Practices, but I have had this particular point explained repeatedly to me by people who have. So there is a possibility that these people have in some way misrepresented how this is presented in that book (perhaps a "converts make the worst zealots" effect), but I don't think so. So, my counter argument is based on second-hand presentations of this idea; so please forgive me and let me know if my following argument misses the mark of what the book presents (in which case, note that my argument should still be applied to these second-hand re-tellings of the idea).

    If we believe this particular claim from Perl Best Practices, then Larry got the design of -s wrong because -s for a non-existant file returns undef not an empty list. And I personally hope that TheDamian (the author) and chromatic (an obviously staunch supporter of this claim, as evidenced elsewhere) haven't convinced the rest of "the Larrys" that this should change in Perl 6.

    So, in hopes of helping some previous readers who couldn't deal with a rather small amount of abstraction, let's define the following very concrete subroutine:2

    sub getFileSize { my( $fileName )= @_; return -s $fileName; }
    2 Perhaps a few of you can't imagine why anyone would prefer to use getFileSize($fileName) over -s $fileName, so let me try to put that worry out of your mind. Although I can see some real advantages in some situations, that is not at all the point of this exercise. This is just a very concrete example of a function that returns a single, defined scalar value and where the concept of "failure" applies and where it makes sense for "failure" to be indicated by undef (at least in a scalar context). Surely we can all agree that such functions are not terribly uncommon in the universe of Perl, especially since -s is exactly such a function and is even just one of many related such functions. I'd normally just use getFoo($blah), but that previously caused problems for (at least) one person who had great difficulty evaluating the appropriateness / likelihood of different sample uses of such a (slightly abstract) function.

    Now, this "best practice" would require us to rewrite that function more like so:

    sub getFileSize { my( $fileName )= @_; my $size= -s $fileName; if( ! defined $size ) { return; # Return an empty list to indication "failure" } return $size; }

    So let's look at how such a function is likely to be used and see why I believe that the risks (of bugs) caused by returning an empty list are much greater than the risks (of bugs) caused by returning undef for "failure" for this type of function.

    First, the case for "return an empty list". The following code is about the only type of example I can think of where this "false in a list context"3 situation is likely to crop up.

    3 "false in a list context" is a mis-statement that I've heard repeated more than once so I am guessing it is actually used in Perl Best Practices. To be false requires a "Boolean context" which is a case of scalar context which is not list context so "false in list context" must really mean using the function in a list context that is part of some larger expression and that larger expression is being used in "Boolean context". I think that, by far, the most common case of such a construct is "list assignment in a Boolean context".
    my $fileName= getFileName(); my @size; if( @size= getFileSize($fileName) ) { print "The file, $fileName, has a size of @size bytes.\n"; } else { print "Can't determine file size for $fileName: $!\n"; }

    We are mostly interested in the if line. Does that look like code you are likely to see? Other than allowing for "I see a lot of stupid code", that doesn't look like code I'd expect to see in well-written code. It just doesn't make much sense to want to assign a single scalar to an array and to then test that assignment statement for "truth". Granted, such code is certainly possible to write. My point is only that I find such code very unlikely. I also find such code awkward.

    So, if somebody has some better examples of better code that demonstrates this type of conflict (so called "false in a list context" for a function that, when it doesn't "fail", only ever returns a single scalar), I'd like to see that. But I doubt it will be as "likely" as the sum of all of the cases of the opposite problem, some examples of which I will now present.

    Perhaps the classic example that I've seen many times4 is:

    4 And that made me wish that => could have been made to force scalar context on both of its sides, similar to how it eventually got the ability to encourage "string context" on its left side.
    my %sizes= ( $inputFile => getFileSize($inputFile), $controlFile => getFileSize($controlFile), );

    or perhaps:

    my %object= ( InputFile => $inputFileName, InputSize => getFileSize($inputFileName), ControlFile => $controlFileName, ControlSize => getFileSize($controlFileName), ); return bless \%object;

    If we follow the book's proposed "best practice" and if neither our input file nor our control file exist yet, then those code samples end up functioning like the following, respectively:

    my %sizes= ( $inputFile => $controlFile, ); # and my %object= ( InputFile => $inputFileName, InputSize => "ControlFile", $controlFileName => "ControlSize", ); return bless \%object;

    Some of our keys have been interpretted as values! That is surely a bug. A bug caused by the proposed "best practice". And for code that I find quite natural and in a style that I often see.

    Some would make a (perhaps circular) argument that this is exactly why such code should make use of scalar. But the point of getFileSize() is to just get a single scalar so it doesn't make much sense to say "no I don't want a list of one file size scalar, I just want a scalar version of that list".

    It doesn't make sense to change a function from

    returns a scalar (that may be undefined)
    to
    returns a scalar, unless there was a failure in which case it returns an empty list
    which then requires
    we use scalar getFileSize() because we want just the file size, unless it fails, in which case we want the scalar version of the empty list, also known as undef
    How can all of that added mental complexity possibly lead to better code quality? If you buy that approach then you must surely write:

    my %sizes= ( $inputFile => scalar -s $inputFile, $controlFile => scalar -s $controlFile, );

    But I don't think I've ever seen anybody do that.

    Now let's rewrite our first code example a different way to demonstrate another pitfall:

    my @files= getFileNames(); my %sizes= map { $_ => getFileSize($_) } @files; # or my %sizes; @sizes{@files}= map getFileSize($_), @files;

    Those both look like reasonable, likely code, and they both break horribly and differently if we follow the "return empty list" best practice. And they don't look like code where I'd expect to see people using scalar. Another example:

    my @files= getFileNames(); my @sizes; for my $fileName ( @files ) { push @sizes, getFileSize( $fileName ); }

    That example is a little harder to justify. But keeping parallel arrays is certainly something that I've done and that I've seen done (and there can be good reasons for using that technique). And I can see someone avoiding map for several reasons. But I'm mostly just looking at the push statement. That seems very reasonable and likely code and something that is likely not what was meant if an empty list indicates failure.

    Another example.

    sub missingIsSmallest { my( $x, $y )= @_; if( ! defined $x ) { if( ! defined $y ) { return 0; } else { return -1; } } elsif( ! defined $x ) { return 1; } else { return $x <=> $y; } } my @files= sort { missingIsSmallest( getFileSize($a), getFileSize($b) ) } getFileNames();

    This fails in subtle ways given the "best practice". And this failure applies to just about any use of a simple scalar-returning function as an argument to some other function. Surely people agree that it is quite common to pass an argument to a function that is an expression involving a simple call to another function. Whether the outer function takes positional arguments or "named" arguments doesn't even matter. Both cases are messed up by the "best practice":

    reportFile( $file, getFileSize($file), "Financials" ); # or reportFile( FileName => $file, FileSize => getFileSize($file), Description => "Financials", );

    So I find the pattern of "function returns a simple scalar except to report failure" to be much, much more likely to cause bugs than the pattern "function always returns a simple scalar (which is undef in the case of failure)". I understand the mis-stated point of "false in a list context", but it just looks like a problem that is quite unlikely for somebody to trip over.

    I certainly recall more cases of people posting at PerlMonks because they got bit by expecting a single scalar from something that sometimes returns an empty list than I recall people posting because they got "true" because they assigned to an array/list in Boolean context the results of a function call that is normally used to fetch one scalar value.

    - tye        

Re^4: getting 2 things done at once with Class::Base (return undef)
by jplindstrom (Monsignor) on Nov 02, 2007 at 14:07 UTC
    It is indeed in PBP, but IMHO this is one of the less thought through practices. The example and problem described in the book applies to a specific situation, but the conclusion/solution is then elevated to a general practice.

    The real lesson here is: if the sub is documented to return a scalar and it uses a return code to signal failure, it should return a scalar. It shouldn't randomly introduce context sensitivity in case of failure.

    /J

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://648607]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-19 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found