|Problems? Is your data what you think it is?|
I dislike object-oriented programming in generalby vrk (Chaplain)
|on Oct 16, 2007 at 19:34 UTC||Need Help??|
Since my previous meditation seemed to upset some people, here's more serious pondering.
This is not the first node on the merits of Object-Oriented Programming (or Object-Oriented Paradigm), and certainly not the last. Interested monks can read The world is not object oriented, Damian Conway's ten rules for when to use OO, and Coding styles: OOP vs. Subs for starters.
I dislike the object-oriented approach
Before proclaiming my dislike, I ought to explain what I refer to when I say object-oriented, since it has many different interpretations: I mean the approach to programming that touts
I'm sidestepping the issues how these are implemented, what kind of message-passing or inheritance mechanism is used, and certainly the syntax involved.
However, it's not the components of the approach that are icky; it's the mindset taken as a whole. In particular, I claim that
Although dealt with here separately, these are all aspects of the same thing.
Separation of concerns
An expression popularized by E. Dijkstra, separation of concerns means decomposing the problem into smaller conceptual pieces, disentangling it so that you can attack one part of the problem at a time without worrying about the rest just yet. It's a general problem-solving strategy, but it has particular importance in programming.
If you forget for a while (separation of concerns in action) that source code or program text is executable on a machine, what the program really is is a document from one human to another describing how to solve the given problem. The solution is written in a formal language to be as explicit and precise about it as needed, though unfortunately often in terms of a (abstract) machine. It's information.
In order to understand the steps taken to solve the problem or to understand the solution, there has to be a way to encode this information in such a way that you don't have to think about several different things at a time, but that you can concentrate on understanding one component of the solution, then move to the next. If you are doing statistical analysis on a set of data, you certainly need not and should not think about computing the correlation coefficient or plotting the values on the screen when you are retrieving the data from, say, a database. They are different concerns entirely.
Since computer programs have no real physical limitations (similar to material objects), it's absolutely essential to prevent yourself from making a mess of it. You, the programmer, have to actively ignore the temptation to just connect parts of the program criss-cross because it seems convenient at the time. All current programming paradigms try to assist and even enforce separation of concerns, including object-oriented programming. The strategy is to contain different parts of the solution in separate units, modules, of which more next.
Modularity is extremely good
Ignoring trivial programs, modularity is necessary to both understand a problem and find the solution. Modularity at the source code level means building your solution from blocks or units that ideally solve that one part of the problem: do only one thing, but do it well. The whole solution is then built by connecting the different modules together.
Although there is often correspondence between the concerns and the modules, this is not a bijective mapping. Often, one concern needs, due to the magnitude of the problem, several modules, and sometimes one module can address several concerns. Bringing back into mind again that the program text can be executed by a computer, which can then find solutions to instances of the problem, there may also be engineering reasons why dividing modules into smaller submodules is necessary.
An important aspect of modules is that ideally they are replaceable and reusable. Replaceable: if you find a better way to solve a particular subproblem, you can replace an existing module without having to modify the rest of the program. Reusable: once you have solved a subproblem that is general enough, that is, is bound to recur, you can reuse the module in the next problem. More about these below when we discuss interfaces.
A fundamental building block people often overlook is procedural abstraction: procedures or functions. They can and should be considered modules in and of themselves, because when written correctly, they are replaceable and reusable, and they abstract one small part of the solution behind a good name. (This applies to any programming language that has procedures or functions, not just functional programming languages.)
Also an important aspect in modules (including functions) is parameterization. That is, parts of the module's behaviour can be abstracted into parameters to that module, parameters that can depend on the problem instance. Not only does this add reusability, but in all modern programming languages you can also give modules as parameters to other modules (in one way or another), which enables you to separate concerns to (almost) orthogonal categories.
When you have separate, decoupled modules, becoming certain that they work correctly, or, if you are bent that way, proving that they are correct, becomes much easier. You have less cases to consider, less interactions between different parts, and simply less to take into account.
Decoupling interface from implementation is vital
The interface to the module is arguably even more important than the implementation behind it. This is a commonly repeated phrase in the Perl community.
The interface to the module means the connection points that you can use in combining the module with other modules. The interface abstracts functionality away; ideally you need absolutely no knowledge of how the module works, as long as you know what it does. A bad interface depends directly on implementation details that, when changed, propagate needs to change the program outside the module. A good interface seals the insides of the module from the outside world. An excellent interface is transparent: like superb quality hi-fi speakers, you cannot hear the speakers when you listen to music; similarly an excellent interface is concise yet complete, and deals directly with the right abstractions.
This decoupling is important not only when making changes to modules (when you can concentrate on changing exactly one part of the program), but also in helping separate what from how. This cannot be stressed enough, though you are probably exhausted to hear it. This makes it conceptually easier to understand what the module does: it can only receive and send data through the interface, and what is not there is not there adding to the complexity.
On the need to abstract
Abstracting is always necessary when programming, because the computer, by its discrete design, can model the real world only very poorly (and this is often not even what is wanted). You need to find a way to express the key concepts and their relations of the problem in a way that yields itself to encoding the concepts and relations in a programming language.
However, an equally important part of abstracting is to be able to distill the key concepts from the problem in the first place. Abstractions are necessary in reducing complexity, but used poorly, they will instead add complexity. The simpler way should always be preferred, and information that is not needed should be completely ignored.
A good example, as told by E. Dijkstra (in one of his EWDs, forgive me the lack of link), is dealing with synchronization issues in a multitasking environment: several threads or processes that share resources. Even if you have knowledge of the relative running times between instructions, so that you could perhaps time access to the shared resources based on this, the problem becomes much easier if you forget about the time information and focus on only sequential access to the resource -- a process may access a resource before or after another process, and that's it. (Although this is given nowadays in the problem domain, the situation was different in 1960s. According to Dijkstra, many people opposed to throwing away valuable information.)
This is actually hard to do in general in my experience. It takes much practice and thinking to find the right concepts, but this is a topic for another day.
The problems in OOP
Object-oriented programming deals with all of the issues above in the following way:
A more detailed description of object-oriented design and principles is beyond the scope of this meditation.
Now, my beef with OOP is the following:
In other words, what might otherwise be a good and universal way to decompose and model a problem, if suboptimal in some instances, is botched by three rather irritating things.
Anthropomorphic terminology -- sloppy thinking
First, anthropomorphic terminology: we have entities and objects ("that guy" and "this guy") who send messages to each other ("this guy send the packet to this one"). While this is not a requirement in OOP, it is inherent to the way you are supposed to think about problems: in terms of objects and messages.
Not only is this inelegant, there are two serious problems it creates:
In other words, both ingredients in the recipe for sloppy thinking.
Trying to find entities where there are none is perhaps excusable. After all, arguably it is often easier to see the problem as a bag of co-operating entities. Perhaps this can be attributed to us being prone to seeing ourselves in everything (which ranges from mistaking a tree branch in a dark forest to a human being to projecting our desires on an amoeba).
Trying to model the system or problem as a set of entities is particularly useful in some domains, such as the frequently cited graphical user interface. Sometimes there is simply a natural fit.
However, this is a sad price to pay, because anthropomorphic terminology leads to operational thinking. By operational thinking, I mean trying to understand a program in terms of how it is executed on a computer. Usually this involves keeping mentally track of variables and their values, following one statement after another checking what it does to data, and doing case analysis with if-then-else blocks. You are knee-deep in problems once you start trying to understand loops operationally (does it terminate? will the loop counter be always inside the bounds?).
An even more complicating factor is that OOP guides you towards thinking about interactions between classes, or rather objects, operationally, that is, what will happen at runtime. Due to inheritance, the number of cases simply explodes, as instead of having an object of one type, in its place could equally well be another, similar object, whose class inherits from the class of the first object. Does this matter at the point where the object is used? Maybe. Good luck with trying to check that operationally. Not only that, but there is a proliferation of small, unrelated internal states: the states of the objects. It's simply too much to hold in one's head, even if you can only access the internal state through the interface.
Perhaps there is a way to reason about the correctness of object-oriented programs in a non-operational way, but I have yet to see it.
Object-oriented programming includes an idea that concepts in the problem domain, be they entities or relations, should be modelled directly as classes. So far so good, but this is a tricky path to travel.
Although you theoretically can map almost any entity to a class, there are many cases where you simply should not. Consider keeping track of balls that have ended up in the lake in an amateur golf course. According to object-oriented thinking, the natural way to model this problem is to create a class for the golf ball, and another "modelling" the lake, a container class for the lost golf balls. Instantiations of the golf ball class, golf ball objects, would be added to the container, and then simply asking the container how many balls it contains gives the answer.
However, if there is nothing more we need to know about the golf balls or the lake, the entire concept can be modelled with a single integer. If there is no need to make a difference between golf balls, they can all be considered identical and treated identically.
The example is trivial and also a caricature, yet this kind of thinking seems pervasive among OOP proponents. Note in particular that I am not talking about "implementation overhead" or any notion of the former being slower to execute than the latter. I am talking about conceptual overhead, which is strongly related with separation of concerns and abstraction, particularly finding the key concepts. Having "concrete" golf balls in a "concrete" container object gives perhaps a fuzzy feeling of having created something easy to grasp, but this is an illusion.
The golf balls in this example are identical, and can be treated identically. In fact, there is nothing to distinguish one golf ball from another. We should definitely never introduce such distinctions in the source code, which is a conceptual model of the problem (forget about execution again). As it is a model, and there is supposed to be some goal in making the model in the first place, excluding unnecessary detail will not only make the model simpler and more elegant, it will also make solving the problem much, much easier.
You may snicker at the silly example here, but in my experience extraneous classes are incredibly common. (I'm no better, to be honest, but I try to learn.)
The only way
Although limiting yourself to only OOP may seem like a mistake on your part, object-oriented thinking generally advertises that the class is the only means of abstraction you need. Let us assume for a moment that it is.
While the class is a useful tool in modelling entities, and even in modelling relations, there are frequently cases in my programming where a full-blown object is simply unnecessary: a simple function would do. This is considered bad habit in OOP, since all your abstractions are supposed to take the form of classes.
A real example: I have a small framework that does statistical tests on data. The similarity between two data objects (which are not objects in the object-oriented sense, but can be, for example, numbers or arrays of numbers) depends on the data I am analyzing, and sometimes I want to use different similarity metrics on the same data.
The object-oriented way would be to create a class encapsulating the idea of similarity measure, perhaps into a superclass called SimilarityMetric, which has a public method that, given two data objects (say, numbers), returns their similarity. The class SimilarityMetric is abstract in the sense that by default it defines no similarity metric; it is the responsibility of inherited classes to refine the similarity concept. For instance, you might have a simple class called AbsoluteDifference that would base the similarity on the absolute difference of two numbers. Then, you would instantiate an object from a concrete class, inherited from SimilarityMetric and sharing the same interface, and give it as a parameter to the framework.
The more sensible way, if only your programming language has the means (and Perl does!), is to simply give a function to the framework. While this may seem ad hoc, it needs not be. As long as the framework demands that the function accept certain kinds of input parameters and returns, say, a numeric value as a result, that is, defines the interface that the function must have, then this model is equally well modularized. It is also conceptually considerably simpler, and it solves another problem: polymorphism.
The point deserves more stressing: the function is a module in this case, because it models a single concept (similarity in a particular domain), and it is replaceable (since there is a defined interface). It also operates at a higher abstractional level than a class hierarchy.
In the object-oriented way, if you want your classes to support any type of data, you need to either use generics (which is a topic in and of itself) or create some sort of "interfacing" class that your classes use and that is able to encapsulate the data objects -- in some way.
However, in the functional way, you can supply the similarity function when you supply the framework with the data. If the framework is oblivious to the type of data it analyzes (for instance, requiring only that each data object has an identifier), and only uses the results of the similarity metric in the analysis, then there is no need for generics or polymorphism in the first place: the question never arises, because your function is by construction specialized to handle the input data.
Another way to model the problem in object-oriented way, with inspiration received from the functional model, is to encapsulate the similarity metric with the data. That is, the data container would provide means to compute the similarity between to data objects it contains. However, the same problem presents itself: how do we parameterize the similarity metric? We could define different classes that would accept a data object container and define a similarity metric -- but this is not any better than the first object-oriented model. There is a serious problem: the data container is now coupled with the similarity metric. These are two entirely different concepts and should not be mixed in this way.
If my experience is any guide, there are more cases where a simple function does better than a fully-blown class (or worse yet, class hierarchy).
But I still use objects and classes
Despite my dislike, I often use objects and classes, because in many cases they are a natural fit. There are indeed many problem domains where the most elegant way is to use classes. For example, suppose you have tabulated data of some sort that you wish to print. There are different output formats, say HTML tables and CSV, but they all share the same basic interface; namely, you print a header listing the names of the columns, and then you print rows one at a time.
(Even though the CSV format, informal as it is, defines no header, this can simply be just another row.)
Object-oriented programming is a natural fit here. Say you create an abstract class called Tabulator, and concrete classes HTMLTabulator and CSVTabulator. The implementation details of the concrete classes differ, mainly in what kind of formatting they do, and the HTMLTabulator should probably support setting attributes to rows and columns in some way (such as alignment, width, or column or row spanning) or perhaps it uses templates, but once configuring is done, there is no difference. You can simply give either as a parameter to any function or class that needs to print your tabular data, and if you later need more output formats, simply inherit it again from Tabulator.
Naturally you can solve this with the functional approach, but at least I cannot think of a way that would not be structurally (very) similar to the object-oriented approach and still be better.
To recap: I dislike object-oriented programming in general, but it has its uses. One just needs to be careful, and have more than one tool in the toolbox.
Perl is good
The good part about Perl is that my modules can be simply collections of procedures, or collections of higher-order functions, or classes in the object-oriented sense. I can pass both functions and objects to and from other functions and objects. I can use whichever paradigm I need at any moment.
Of course, with all the power comes responsibility. You need much experience to be able to decide which way works best for any given problem, and I do not claim to have that experience yet. However, I stand fully behind the claim that problems have natural solutions in different paradigmatic approaches. Sometimes object-oriented programming does the trick, sometimes functional, sometimes a completely another approach.
However, I'm slightly worried about the approach taken in Perl 6: everything is an object starting from fundamental types of data such as numeric constants. Sometimes it is useful to think in terms of objects, but often I want a number to be just a number, nothing more. Once again, this is a topic that has meditations already and that deserves another meditation of its own. The situation is not as black and white as this short paragraph makes it seem.
Pardon me all the references to E. Dijkstra, but he was simply an excellent and clear-thinking chap. I also apologize for the possible uses of anthropomorphic terminology in the above. It's difficult to avoid when talking about OOP.