I think you are more likely to get feedback if you move the code on your scratchpad to the body of your RFC....
Makes perfect sense, although I was trying this time to faithfully follow the instructions here Tutorials. But I will do as you recommend because it seems to make more sense.
I really appreciate you have taken the time to write this. The circuit-board analogy is because I come from a Electronic Engineering background, but the inspiration and implementation techniques actually came from stuff I had picked up from Mark Jason Dominus' HOP.
Anyway, the topic at hand is interesting because it's related to balancing design, testing, assertions (as you well pointed out), diagnostics and debugging. Much like in electronics, you run the tests (whether it's a test-bench or embedded self-tests) but if a board fails, you have to test the circuit at each stage to see what is going on. Newer circuits that use I2C actually perform very granular self-tests in an automated fashion and also auto-diagnose themselves (e.g. most modern Phillips appliances do this). So whether this is testing, diagnosis or debugging seems to be actually dependent on the granularity of the test, and ultimately in the eye of the beholder.
A test will tell you that something failed, a diagnosis may dig deeper to help you pin-point the problem. In the Perl testing harnesses, you could probably accomplish this by conditioning more granular tests only if a higher level test failed (regardless if integral or unit test). More commonly as you point out, by unit testing each low-level component and then testing the aggregates, assuming of course, that the code is sufficiently factorized to accomplish this, and this is a again by design, as you also point out.
Back to the circuit analogy: you may have a distinct overall function, say the horizontal drive in a TV set (maybe analogous to a class). This in turn might be divided into 3 distinct parts: a) the h-sync circuitry, b) the middle-stage drivers, and c) the horizontal transistor that drives the fly-back primary (a,b,c maybe analogous to methods/functions or could also be modeled as an association [i.e. simple, aggregation or composition] depending on design needs). The I2C self-test (analogous to integral or unit tests) may indicate a problem with the voltage feeding the middle stage drivers, but within that part of the circuit you have several other tests points (not visible to the I2C) that are used to further diagnose the problem by measuring voltages or by looking at reference wave patterns (analogous to manual debugging). As to how far/deep the unit tests go, is again by design, and design in turn is based on needs, so I don't think there is any definite correct design rule here. Some I2C tests are so particular, that will many times immediately tell you which particular component needs to be replaced, other times it just points out the circuit/section and you have to place your oscilloscope lead in the test-points to figure out which component is failing.
So how far/deep the test must also go to diagnose the failure depends on the granularity of the code, which is a design constraint, in turn based on real-world needs. This may be similar to how much normalization is actually needed, or practical, in an RDBMS model, which many times depends on the real-world needs of that particular system. Furthermore, real-world performance issues may also force you to de-normalize an RDBMS model, or to rethink a complete layer entirely. For example, you may arrive at the conclusion the RBDMs suck at querying real-world objects, so you incorporate another element, say for example CouchDB, that better handles the de-normalized real-world object queries, completely separating the write and read paths. Now, before I divert too far OT, maybe I should start by explaining why this need arose in the first place, and perhaps this may shed some light on whether the test-point analogy makes any sense, or is of any use for Perl automated testing or not.
This idea came to me because I was having a hard time debugging a long function that scores the matching of text fields using Wagner-Fisher. The function in question calculates the individual scores of several text fields which are compared to several text fields in several different groups (from an XML source). Then, the best scores from each group are selected and narrowed down to the individual best match of each group. The function itself is a helper of 2 other functions that take these results and select the best match based on the average score. There is no recursion, but a lot of sub-processing of XML data using XPath and iterating through the different groups to calculate the individual scores for each field in each group and feeding that result to the other functions that in turn aggregate the results and help narrow down to the best match. So you see, the code is already sufficiently factorized into 3 functions but the scoring function although long, makes little sense to break up into smaller functions, or to encapsulate in smaller classes (though with enough time and budget this may not necessarily hold true).
The reason I implemented this debugging/testing technique is to make sure that we were scoring correctly at every step, and that when we add new logic, I could make sure that the programmer who later touched this code (or myself) would not screw up the not so complex but lengthy scoring algorithms. This is because previsions were left in the code for additions or modification of groups, fields, scoring rules and these have been changing since we've put the product through beta testing with key customers. I agree with you, this code is far from perfect, but this test point / granular testing (diagnosing or debugging) technique has proved very useful at this stage of development, and it's probably more a question of choosing the right title for it.
Your comments, and this comment Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by swon made me re-think this if this is actually a testing or a diagnosing/debugging technique. In my particular case, I think it's both, and that may lead to some interesting discussions and conclusions here.
On your comments in particular I think that you have a strong point that this could be avoided with better code design, and that probably holds very true. On the other hand, budget and time constraints many times don't allow to design every class and library perfectly up front, so we must all take a more iterative approach to this, and fine grain testing like this has proven instrumental in iterating and evolving this particular class.
Another interesting fact is that many times architectural constraints don't allow for "ideal" modeling in any single paradigm, this case being a particularly good example. This library is a Model class for an application in Catalyst, so a basic design constraint is to try to leave as much code pre-loaded in a Catalyst "instance" and only create new objects that are specific to an individual HTTP request. This means that even though the overall design pattern is OO with Moose, the "instance" classes are more functional libraries than anything else. Also, we have to account for different deployment techniques such as mod_perl with pre-forked processes or using threads with mod_worker where the non-mutable data of the interpreter is shared amongst threads.
In this case, 'ideal' object modeling would represent a huge performance penalty in having to instantiate objects with a lot of code, so the objects in this design are light-weight objects that have a per-request lifespan. The instance code on the other hand, has to make sure we don't have any global data (object attributes) that would create a problem when using the methods (functions) of these long-lived objects that are more akin to an OS shared object (aka dll). This of course does not excuse the fact that these model "instances" could benefit from better design choices to begin with, and I agree. Which reminds me of when I wrote RFC Mocking an Accessor with Test::MockObject (also inspired by test code of a Catalyst application) chromatic said "If you have to mock an accessor, you're probably mocking too much.", and he was right. Because after giving it further thought, I realized that it was better to completely separate my Model classes from those of Catalyst (eliminating much of the need to mock in the first place), and then integrate them back to Catalyst using a very thin layer of "Catalyst Model" classes. Of course, if I would have carefully RTFM on "Extending Catalyst", I would have noticed this recommendation clearly spelled out ;-). Then again, the mocking of accessors technique proved to be equally useful later on.
At this point my conclusion is that a change of title and a bit of generalization might better classify this technique, although in the end it may prove not to be very useful after all, who knows. Maybe something like "Using Test-Points for Better Diagnostics", "Adding Diagnostics to your Tests using Test-Points", "Adding Granularity to your Tests with Test-Points", or something along those lines.