Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: Re (tilly) 4: To sub or not to sub, that is the question?

by dragonchild (Archbishop)
on Jun 26, 2001 at 21:15 UTC ( #91668=note: print w/replies, xml ) Need Help??

in reply to Re (tilly) 4: To sub or not to sub, that is the question?
in thread To sub or not to sub, that is the question?

I'll address your last remark first.

(Similarly I consider it the responsibility of functions to do something useful in both array and scalar context.)

The full function does do something useful in both array and scalar contexts. I didn't write out the full 70 line version because it was meant to be an example, nothing more. If you want, I'll give you the current version of the abstract class that uses that function and we can discuss it. I'm always looking for input and criticisms on my code. I'm merely defending the fact that your criticisms (which may seem to imply that I don't know what I'm talking about) have already been taken into account. :)

I hadn't thought about using Carp and making it the caller's responsability to trap the croak within an eval block. The reason is that I don't want to make any assumptions about how my abstract classes might be used. Carp is great ... for CLI systems. For GUI systems, it's not so good.

Requiring that all values be defined isn't an onerous burden. At some point, in any module, assumptions and restrictions need to be made in order for the module to work well. We, as programmers, accept them all the time. For example, most of the functions in stdlib.h, std.h, and stdio.h return 0 when successful. We accept that 0 is a perfectly valid success code. Why, when every single other language I've come across does things the opposite way, do we blithely accept that? (I in no way am attempting to say that this is wrong, but merely use it as an example of restrictions on the rangespace of functions.)

In addition, attempting to access an attribute that doesn't exist isn't an exception. It can be a perfectly valid case. For example, you might have a list of objects of various kinds. You want to do processing on a list of attribute names, but not every attribute is in every object. So, you could do something like:

foreach my $object (@object_list) {\ ATTRIUBTE: foreach my $attribute_name (@attribute_list) { my $value = $object->get($attribute_name); next ATTRIBUTE unless defined $value; # Do stuff here. } }

That may seem contrived, but I'm sure you can think of other, similar examples. This is much clearer to the average programmer, imho, than using an eval block and checking $@. Plus, if someone wants to use undef, they can usually just use "" and everything is just fine. Or, if they absolutely need both undef and "", use the string __UNDEF__ (or some variant), and they're fine. If you can give me a real example where the same attribute can use undef and "", I'll be impressed. :)

Finally, I don't think that bring up EAGAIN is a good counter-example. Comparing an abstract class to the Unix kernel doesn't really ... I just don't think it's apples and apples. If you could explain it further (I haven't started my exploration of the Linux kernel yet ... that's for this winter), I'd be happy to listen.

Replies are listed 'Best First'.
Re (tilly) 6: To sub or not to sub, that is the question?
by tilly (Archbishop) on Jun 26, 2001 at 22:30 UTC
    Your get function has expanded to 70 lines in production? I have to ask why. While I don't agree with turning good rules of thumb like, "Functions should not exceed 50 lines" into rigid rules, I don't remember the last time I felt the need to write a function that was 70 lines long.

    As for your abstract class issue, you can always make the error sub an anonymous code reference which just *happens* to default to \&Carp::confess but can be set to whatever makes sense in the application. (For a GUI you probably want to write a log, pop up a pop-up window, and then throw an exception internally? I don't have a sense for what makes sense since I have not worked with GUI code for a long time.) If you do that I would make sure that it is everywhere exported from the same place so it can be switched globally as appropriate.

    For your code example, when I write code following a general pattern like that, normally I don't want to make that a silent error. It is too each to have a typo in the list of attributes, I have made mistakes like that a million times. But still if I wanted to make that kind of logic clear, I would suggest creating a method called, "has_attrib" and then calling that in your loop. In fact you can now write it as:

    foreach my $object (@object_list) { foreach my $attribute_name (grep $object->has_attrib($_), @attribute +_list) { my $value = $object->get($attribute_name); # Do stuff here } }
    As for your challenge, don't get impressed too easily. Here is a realistic application off of the top of my head. The application is converting data from one database layout to a very different one. Your objects are instantiated out of queries in the one database and then present views of themselves for the insert into the other. Attributes are fields of tables. The contents must be able to contain any data that would be appropriate for the insert, and that means that you must preserve the difference between NULL (internally represented in Perl as undef) and strings (internally represented in Perl as themselves). Note that while you could use some internal representation of the null value other than the default supported by DBI, doing so would be coding overhead and a possible source of mistakes which you don't want.

    Finally the EAGAIN is not meant as a counter-example. It is just the specific example that really got me thinking about the basic issues with error-handling logic. In general error-handling logic is the least tested logic in our code. Yet it is the logic which we most depend on working when things goes wrong. Therefore error-handling policies that make error information available and scatter the handling of errors to user code is practically a guarantee that when things go wrong we will not have handled it well.

    But here is the EAGAIN issue explained. A very large number of Unix system calls have a possible error that they can return called, "EGAIN". This error means that something was temporarily unavailable, but if you try the call again in a bit, it will probably succeed. A typical example is that fork can fail this way if the OS has run out of process IDs. It is also used to resolve various race conditions. Etc.

    Under normal conditions this error is rare to very rare. User code is supposed to test for it and have appropriate failure/retry logic, but people frequently don't write this logic, or they write it but it is buggy, and it never gets exercised. However the odds of the kinds of races that return this error rise dramatically when the system is under heavy load. Therefore if there is a sudden spike in usage that lasts for a while, processes that have been running for ages may start acting up in a variety of bad ways for no apparent reason.

    Anyways I didn't mean to compare EAGAIN to the mechanics of your application. I just wanted to point to it as an example of why I think that error logic that has been moved to user code - no matter how clearly documented - is going to become error-prone.

    (PS: The system headers you talked about? Some people definitely complain about that. In fact changing how Perl handles errors that come from system calls are on the top of the list of things that Larry Wall wants to change in Perl 6...)

      Wow. This is turning into an actual conversation. (Something I've found can sometimes not materialize on PM, unfortunately.) Let me respond one at a time.

      sub get { my $self = shift; my $class = ref($self) or wantarray ? return () : return undef; # This is just for safety local $_; my @args = $self->coerce_args('ARRAY', @_); my @return = (); if (scalar(@args) < 1) { if (wantarray) { return @return; } else { return 0; } } foreach (@args) { my $key_name = "${class}::$_"; if (exists $self->{$key_name}) { push @return, $self->{$key_name}; } else { push @return, undef; } } if (wantarray) { return @return; } else { if (scalar(@args) == 1) { return $return[0]; } else { return \@return; } } }

      Ok. So, 35 lines, not 70. It was 70ish at one point, before I managed to condense a whole bunch. You'll notice the amount of checking I do.

      There's a huge amount of stuff I want to add to this, such as switching everything to using a dispatch table, so that people can define their own handlers for a given action, like retrieving or setting an attribute. One of the major criticisms I've received from people looking at this style of abstract base class is that you're stuck using the exact same functionality for every single attribute in a given class. Well, adding dispatch tables would fix that. And, obviously, the idea you mentioned of having an error handler that someone could override needs to be added as well. Combining both things would both expand and contract the actual "getting" function.

      There is an exists() function which does your has_attrib() action. I didn't realize we were going to be talking about my abstract base class implementation. (Please look at 88079 for more info.)

      I've never used DBI. From my understanding, a SQL statement using the string "NULL" to represent a null string. That would mean that you cannot realistically store the string N-U-L-L in a SQL DB. However, I haven't seen anyone complaining about that, either. (I could very well be wrong, as I don't work with SQL on a day-to-day basis.)

      Irregardless, I still maintain that every API forces you to accept certain restrictions on your allowable data-set that aren't in the base language. In using that API, we accept those restrictions (which should be very clearly stated in the accompanying documentation.)

        Yeah, every so often I get into a conversation around here. Always seems to surprise people.

        Anyways as I say here I don't personally tend to use get/set methods much for personal reasons. After reading the article that I reference, you might find yourself reconsidering assumptions on OO design as well. But that is neither here nor there.

        First some detailed comments. On clear programmer mistakes (calling a method as a function) you are not raising an error. That is an error and should be caught ASAP. Secondly I don't like the overloading of the return value. If I am returning to a scalar the result of getting a list of values, that will work fine until that array only has one element, then my code breaks. DWIM is something I am cautious of. If getting single values works, then single values will come up in scalar context. Thirdly attributes do not play well with inheritance in your model unless everyone is everywhere using your get/set routines. To me that is too restrictive.

        About DBI, your understanding is only somewhat correct. If you wish to build a string SQL statement you would use NULL for null and "" for an empty string. However when querying from the database you get NULL as undef. Conversely if you use placeholders as DBI recommends, then you slam an undef value into the execute and you will populate the database with NULL values. In short, undef is Perl's version of NULL and both the native input and output respect that.

        Finally I tend to consider APIs that needlessly restrict your data representations as somewhat to very broken. They are, at the least, an area of fragility that can lead to broken code. Certainly I work to avoid that, and I think my code is better for that decision.

        Now here is (untested) a fragment of code that corresponds roughly to your get, implemented closer to how I would do it:

        use Carp; sub get { my $self = shift; my @ret = map $self->_get($_), $self->coerce_args(@_); wantarray ? @ret : $ret[0]; } sub _get { my $self = shift; my $field = shift; exists $self->{"attrib_$field"} ? $self->{"attrib_$field"} : $self->raise("Object '$self' does not have attribute $field"); } sub raise { shift; goto &confess; }
        Since this is untested, there may be obvious typos. But this fixes all of the issues I identified and does some other things. First of all since raise is a method call, it is overridable. Secondly _get is a method call that can be overridden. Classes that want to have attributes accepted that don't fit with this _get can just write their own method, that then defaults to SUPER->_get. That way they can add things like computed fields.

        Basically the watchword here is that if you want to get configurability, insert a method call. Place them roughly where you think people would want "hooks" into what you are doing, and you will get the configurability without having to code anything. An alternate way of thinking about it is that a method call already is a lookup into a dispatch table. So rather than code your own dispatch table that people need to interact with by your rules, just make a method call...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://91668]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2020-01-18 12:10 GMT
Find Nodes?
    Voting Booth?