|Think about Loose Coupling|
OO Identity Crisisby tlm (Prior)
|on Jun 20, 2005 at 14:14 UTC||Need Help??|
tlm has asked for the wisdom of the Perl Monks concerning the following question:
This is an "OO Design 101" question. I will cast it in terms of an example using trees, but I could have just as well used linked lists, graphs, etc.
Suppose that I want to design a class Tree to represent (you guessed it) trees, with the additional requirement that the tree must not contain "duplicate nodes" (the reason for the scare-quotes here will become apparent soon). Suppose also that the interface for this class includes a method add_node, whose job is to add a new node to the set N of the tree's nodes.
But how can is_duplicate() tell whether a new node is the duplicate of an already added node? Strictly speaking, it can't without some help from the client code, because each application has an idiosyncratic definition of equality.
For example, suppose the tree class is being used to represent processes running in a Unix system, and the tree class has to decide whether two Process objects $p0 and $p1 are the same. It won't do to compare the stringified versions of these objects (i.e. "$p0" eq "$p1") because the objects could be different by this criterion and yet be logically equivalent (e.g. though they are different objects all the fields that together uniquely characterize these objects as Unix processes, such as the start time, PID, PPID, etc., have identically matching values).
OK, to recap, we want to design this Tree class, and in particular, its add_node method to prevent adding "duplicate nodes."
One draconian possibility is this: require client classes to define an id method, returning an "application-unique" string identifying the object (e.g. the PID for the process). Then add_node would could be written like this:
The obvious shortcoming of this approach is the fact that it requires client classes to modify their API. A very unreasonable demand, especially considering that there is no consensus on what the name of this method ought to be. A different data structure class, say Digraph, written by someone else, may use ID or hash_code for the same purpose.
Another possibility would to define a wrapper class called Node, whose interface would consist (minimally) of one constructor and the methods id and object (or payload or content). The constructor for Node could include slots to specify both the payload object and some unique ID. It would be up to the client code to ensure that two objects receive the same ID if and only if they represent the same entity (e.g. the same Unix process).
With this scheme, add_node would be defined as above, and client code would invoke it like this:
A third alternative would be to add one configuration parameter to the Tree class. This optional parameter would be a code ref that the Tree class would apply to client objects to determine their unique ID. It would default to plain stringification:
Then add_node would be defined like this:
Or, for greater flexibility, maybe the last two approaches could be combined:
...though this may be a case of misguided featurism.
I'd be interested in reading your opinions on these approaches, and/or suggestions for solving this design problem.
P.S. BTW, I realize that some of the solutions above could be implemented by overloading "", but, even though I understand the mechanics of overloading, I've been bitten enough by overloading-related bugs that I must conclude I am not yet at the level that I can use this tool safely in production code.
the lowliest monk