|Problems? Is your data what you think it is?|
Re: Can I please have *simple* modules?by BrowserUk (Pope)
|on Nov 23, 2005 at 08:33 UTC||Need Help??|
This issue is so symptomatic of all library systems, but especially OO library systems; and even more so when the components that make up the overall system are developed by disparate groups, of which CPAN is the most extreme example I know.
When you sit down to develop a module, as opposed to solve a problem, you have a single core problem you are addressing. You put the code in place to solve that, but then you have a single function, or a class that has a single constructor and a single result producing method. At this point you are tempted to try and envisage all the ways that your module might be used, with the usual result that you add a few extra subs/methods that make it's use 'easier' or 'more intuitive' for half a dozen "common use scenarios".
Then the next guy does the same with his module. And the next with his.
And then someone comes along and picks up these modules to reuse them within their module. And then they add a few extras to support various use cases their users might have. And, they also have to write various bits of glue code to match the data-structural requirements of their module to those of the various modules they reuse. And, potentially, mappings between the data requirements of those reused modules.
The result is that you end up with a considerable amount of extra code at each level that isn't used by any given caller. This accumulates with each new layer that gets added. This has a detrimental affect on performance, code-base size, load-time footprint. The problem grows exponentially relative to the number of layers.
And most importantly, the communication paths, and therefore the hysteresis involved, in maintenance of the top levels, grows exponentially also. Every time a change is required as a result of bugs or feature additions, the need to communicate with others, convince them of the requirements for change, and the potential that the change you require is going to break their module for some or all of their other users, can lead to molasses-like problem resolution.
The alternative, as exemplified by the Java libraries, is to break each piece of functionality out into separate classes. In the case of the Java libraries, with their single inheritance model and interfaces fudge to class partitioning, this leads to cut&paste code reuse, (of concrete implementations of the interface methods in different classes), as the only option for dealing with common functionality across the breadth of the class structure. And horribly deep vertical superclass trees. ANd Java has the advantage of a single controlling body making the decisions--unlike CPAN.
Using the MI approach can avoid the cut&paste code reuse, by subdividing functionality into a fine-grained mesh. The problem with that is that it has a tendency to produce base classes that get thinly wrapped, by almost every caller, in order to bend the interface to the needs of their application. It also has the effect of producing very wide class structures with cross-links, and therefore cross-dependency chains, at many levels. This is where traits/mix-ins would be most beneficial, especially in Perl.
Why especially in Perl?
Compared to Java, C++, Haskell et al., Perl has a very small set of basic data types. This means that if the instance data of classes is maintained within Perl's native data types; scalars and arrays (not sure about hashes yet); then it becomes easy to write traits/mix-ins that operate directly upon the instance data of their symbiotic partners. Just as map, grep, etc. can operate directly upon any array, regardless of the type or meaning of their contents, so a well written trait or mix-in should be able to operate directly on the instance data of it's host with a minimum of, or no data conversion required.
The big idea behind Java's interfaces is that if two classes fulfill the same Interface specification, the it becomes possible to write a concrete class that implements that Interface and it will be usable from both classes. The problem that arises is that the methods of the Interface often make little or no sense in respect to the classes that import them. At the simplest level, a method name that makes sense in terms of the Interface, may make no sense at all in terms of the class importing it, even though the operation performed does.
Eg. An interface that calculates the minimum of a set of numbers might use the word 'minimum' in the method name. Of two modules using that Interface, one might better name the method ShortestRoute(), and the other BestPrice(). That's a weak example, but illustrative.
Example 2: One hosting author might prefer getters and setters and the other mutators.
In both cases, the authors of the hosting classes are forced to write trivial wrappers around the Interface methods to achieve their requirements. And that assumes that the data storage used is compatible.
A harder problem, that harks back to the need for 'interface glue code' I mentioned earlier, is the need to restructure the data used within the hosting class to a form that the Interface can manipulate. And back again if the method is mutative.
Perl 5's preponderant use of hash-based objects, and the tendency to wrap every piece of data, even aggregates, into separate instances, means that every method that operates upon that aggregate has to be written to understand the interface to those objects. It has to know what methods to call to get and set the values inside as a very minimum.
If aggregate data is stored within Perl's base data-types, particularly arrays, it becomes possible to write methods that can manipulate a variety of data using common code with no need for type introspection cascades. The ultimate expression of this is sorting. A module like Sort::Key can export a small set of entrypoints, that deal with the various value interpretations of Perl's scalars, and then be re-used for many situations with the only proviso that the data be in list form.
The ultimate expression of this maximising re-usablility through the expedient of a minimal type system, is Lisp--where damn nearly everything is a list, including code.
This is where I get antsy about the varying specifications of traits, mix-ins et al. Some propose that they should have no instance data of their own; others would permit it. Some would utilise introspection of their hosts to achieve their function; others require the hosting class to provide concrete implementations of one or more methods. Which combination of permissions and restrictions is going to make for the best combination of usability and re-usability whilst imposing the least amount of glue, is still very unclear in my mind.
And I think that the balance will be markedly different for Perlish, dynamic languages than it is for statically compiled languages; with the fulcrum of the balance shifting according to how much compile-time effort can be appropriately expended upon rationalisation, optimisation and automated interface code generation.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.