Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re (tilly) 4: To sub or not to sub, that is the question?

by tilly (Archbishop)
on Jun 26, 2001 at 18:46 UTC ( #91605=note: print w/replies, xml ) Need Help??


in reply to Re: Re (tilly) 2: To sub or not to sub, that is the question?
in thread To sub or not to sub, that is the question?

Two questions for you.
  1. Why should your user code make assumptions for you about whether or not undef is a valid value? That implies a knowledge of your internals that I don't like.
  2. Compare two coding standards. One says that it is the responsibility of calling code to test for possible error returns and the other says it is the responsibility of the code being called to test and throw exceptions. Which is more likely to happen without error? In which can you put better error messages?
Feel free to convince me otherwise, but my current thinking is quite clearly that it is better to throw exceptions early. User behaviour that didn't want that can trap it with eval BLOCK. (Similarly I consider it the responsibility of functions to do something useful in both array and scalar context.)

BTW the shortcomings of C have a lot to do with my attitudes on throwing exceptions versus making it the responsibility of the called code to process possible error returns. For a particularly bad example, consider EAGAIN. It is well-documented. A ton of system calls can return it. But in practice it almost never gets returned, that aspect of code never gets tested, and so that aspect of code tends to be particularly buggy. The result? A Unix system runs fine for months on end. Then you put it under unusual load, once, and key long-running processes wind up with messed up states. Right when you least wanted that. And it isn't just one or two badly written programs that do that. There are lots of them out there.

Now there are reasons why EAGAIN exists, legitimate reasons. Given what a Unix kernel has to do, I am not going to argue that it shouldn't exist. But that is a reliability flaw that I don't think should be introduced lightly.

  • Comment on Re (tilly) 4: To sub or not to sub, that is the question?

Replies are listed 'Best First'.
Re: Re (tilly) 4: To sub or not to sub, that is the question?
by dragonchild (Archbishop) on Jun 26, 2001 at 21:15 UTC
    I'll address your last remark first.

    (Similarly I consider it the responsibility of functions to do something useful in both array and scalar context.)

    The full function does do something useful in both array and scalar contexts. I didn't write out the full 70 line version because it was meant to be an example, nothing more. If you want, I'll give you the current version of the abstract class that uses that function and we can discuss it. I'm always looking for input and criticisms on my code. I'm merely defending the fact that your criticisms (which may seem to imply that I don't know what I'm talking about) have already been taken into account. :)

    I hadn't thought about using Carp and making it the caller's responsability to trap the croak within an eval block. The reason is that I don't want to make any assumptions about how my abstract classes might be used. Carp is great ... for CLI systems. For GUI systems, it's not so good.

    Requiring that all values be defined isn't an onerous burden. At some point, in any module, assumptions and restrictions need to be made in order for the module to work well. We, as programmers, accept them all the time. For example, most of the functions in stdlib.h, std.h, and stdio.h return 0 when successful. We accept that 0 is a perfectly valid success code. Why, when every single other language I've come across does things the opposite way, do we blithely accept that? (I in no way am attempting to say that this is wrong, but merely use it as an example of restrictions on the rangespace of functions.)

    In addition, attempting to access an attribute that doesn't exist isn't an exception. It can be a perfectly valid case. For example, you might have a list of objects of various kinds. You want to do processing on a list of attribute names, but not every attribute is in every object. So, you could do something like:

    foreach my $object (@object_list) {\ ATTRIUBTE: foreach my $attribute_name (@attribute_list) { my $value = $object->get($attribute_name); next ATTRIBUTE unless defined $value; # Do stuff here. } }

    That may seem contrived, but I'm sure you can think of other, similar examples. This is much clearer to the average programmer, imho, than using an eval block and checking $@. Plus, if someone wants to use undef, they can usually just use "" and everything is just fine. Or, if they absolutely need both undef and "", use the string __UNDEF__ (or some variant), and they're fine. If you can give me a real example where the same attribute can use undef and "", I'll be impressed. :)

    Finally, I don't think that bring up EAGAIN is a good counter-example. Comparing an abstract class to the Unix kernel doesn't really ... I just don't think it's apples and apples. If you could explain it further (I haven't started my exploration of the Linux kernel yet ... that's for this winter), I'd be happy to listen.

      Your get function has expanded to 70 lines in production? I have to ask why. While I don't agree with turning good rules of thumb like, "Functions should not exceed 50 lines" into rigid rules, I don't remember the last time I felt the need to write a function that was 70 lines long.

      As for your abstract class issue, you can always make the error sub an anonymous code reference which just *happens* to default to \&Carp::confess but can be set to whatever makes sense in the application. (For a GUI you probably want to write a log, pop up a pop-up window, and then throw an exception internally? I don't have a sense for what makes sense since I have not worked with GUI code for a long time.) If you do that I would make sure that it is everywhere exported from the same place so it can be switched globally as appropriate.

      For your code example, when I write code following a general pattern like that, normally I don't want to make that a silent error. It is too each to have a typo in the list of attributes, I have made mistakes like that a million times. But still if I wanted to make that kind of logic clear, I would suggest creating a method called, "has_attrib" and then calling that in your loop. In fact you can now write it as:

      foreach my $object (@object_list) { foreach my $attribute_name (grep $object->has_attrib($_), @attribute +_list) { my $value = $object->get($attribute_name); # Do stuff here } }
      As for your challenge, don't get impressed too easily. Here is a realistic application off of the top of my head. The application is converting data from one database layout to a very different one. Your objects are instantiated out of queries in the one database and then present views of themselves for the insert into the other. Attributes are fields of tables. The contents must be able to contain any data that would be appropriate for the insert, and that means that you must preserve the difference between NULL (internally represented in Perl as undef) and strings (internally represented in Perl as themselves). Note that while you could use some internal representation of the null value other than the default supported by DBI, doing so would be coding overhead and a possible source of mistakes which you don't want.

      Finally the EAGAIN is not meant as a counter-example. It is just the specific example that really got me thinking about the basic issues with error-handling logic. In general error-handling logic is the least tested logic in our code. Yet it is the logic which we most depend on working when things goes wrong. Therefore error-handling policies that make error information available and scatter the handling of errors to user code is practically a guarantee that when things go wrong we will not have handled it well.

      But here is the EAGAIN issue explained. A very large number of Unix system calls have a possible error that they can return called, "EGAIN". This error means that something was temporarily unavailable, but if you try the call again in a bit, it will probably succeed. A typical example is that fork can fail this way if the OS has run out of process IDs. It is also used to resolve various race conditions. Etc.

      Under normal conditions this error is rare to very rare. User code is supposed to test for it and have appropriate failure/retry logic, but people frequently don't write this logic, or they write it but it is buggy, and it never gets exercised. However the odds of the kinds of races that return this error rise dramatically when the system is under heavy load. Therefore if there is a sudden spike in usage that lasts for a while, processes that have been running for ages may start acting up in a variety of bad ways for no apparent reason.

      Anyways I didn't mean to compare EAGAIN to the mechanics of your application. I just wanted to point to it as an example of why I think that error logic that has been moved to user code - no matter how clearly documented - is going to become error-prone.

      (PS: The system headers you talked about? Some people definitely complain about that. In fact changing how Perl handles errors that come from system calls are on the top of the list of things that Larry Wall wants to change in Perl 6...)

        Wow. This is turning into an actual conversation. (Something I've found can sometimes not materialize on PM, unfortunately.) Let me respond one at a time.

        sub get { my $self = shift; my $class = ref($self) or wantarray ? return () : return undef; # This is just for safety local $_; my @args = $self->coerce_args('ARRAY', @_); my @return = (); if (scalar(@args) < 1) { if (wantarray) { return @return; } else { return 0; } } foreach (@args) { my $key_name = "${class}::$_"; if (exists $self->{$key_name}) { push @return, $self->{$key_name}; } else { push @return, undef; } } if (wantarray) { return @return; } else { if (scalar(@args) == 1) { return $return[0]; } else { return \@return; } } }

        Ok. So, 35 lines, not 70. It was 70ish at one point, before I managed to condense a whole bunch. You'll notice the amount of checking I do.

        There's a huge amount of stuff I want to add to this, such as switching everything to using a dispatch table, so that people can define their own handlers for a given action, like retrieving or setting an attribute. One of the major criticisms I've received from people looking at this style of abstract base class is that you're stuck using the exact same functionality for every single attribute in a given class. Well, adding dispatch tables would fix that. And, obviously, the idea you mentioned of having an error handler that someone could override needs to be added as well. Combining both things would both expand and contract the actual "getting" function.

        There is an exists() function which does your has_attrib() action. I didn't realize we were going to be talking about my abstract base class implementation. (Please look at 88079 for more info.)

        I've never used DBI. From my understanding, a SQL statement using the string "NULL" to represent a null string. That would mean that you cannot realistically store the string N-U-L-L in a SQL DB. However, I haven't seen anyone complaining about that, either. (I could very well be wrong, as I don't work with SQL on a day-to-day basis.)

        Irregardless, I still maintain that every API forces you to accept certain restrictions on your allowable data-set that aren't in the base language. In using that API, we accept those restrictions (which should be very clearly stated in the accompanying documentation.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://91605]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-01-20 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?