http://www.perlmonks.org?node_id=880012

Note:This RFC has been superseded by RFC Using Test-Point Callbacks for Automated Diagnostics in Testing

Happy new year fellow Monks!

I'd like to start the new year by clearing out my pending list with PM which starts with a Tutorial I've been meaning to write about a technique I've been using to test Moose code (but can be used actually for any library).

Anyway, please have a look and see if this makes any sense or have I been re-inventing the wheel here? Maybe there are better or actually more standard "best practice" ways of doing this. Or perhaps this could could make it to a test module of some sort? it's been working well for me and hopefully it may be useful for others as well, and worthy of a Tutorial. Your comments very welcome.

Update: Moved the RFC here instead of my scratchpad

Update: Changed the title, and updated the prose using comments and ideas spawned from the discussion below up to 20110102-1600UTC

Using Test-Point Callbacks for Automated Diagnostics in Testing

What is a "Test-Point" and why should I care?

Tests usually fail at most with a diagnostic message regarding the overall result of the test or some observable side-effect available to the test code. Then it's up to the programmer to elevate the debugging level and re-run the tests in the hopes that the diagnostic debug messages will give some clue as to what went wrong inside a specific method or subroutine. If the diagnostic messages don't help, the programmer then dives into the code placing warn/debug statements or firing up the debugger placing breakpoints, watches, etc. to determine under what conditions this particular test was failing.

Many times, the breakpoints and debug-level messages are left in the code as they innocuous at runtime, but are left there, precisely as a reminder that this is usually a good place to stop and inspect something when testing and debugging. It is these spots in the code which we will call "Test-Points" in the context of this tutorial. They are places in the code that help you diagnose a failure, akin somewhat to assertions, but targeted not at unexpected results or side-effects, but rather at expected flows, values, etc. given certain conditions created by a specific test. Anywhere you place a permanent debug statement, or have found the need to place a DB breakpoint, qualifies as a potential TP.

It is important to note that contrary to assertions, Test-Points (or TP) are mainly used to give the test code the ability to inspect inside methods at specific places, when running a specific test, and just as debug statements and pre-defined breakpoints they can be safely left in the code. The TPs analogous to those found in printed circuit boards which are sometimes used by special self-test / diagnostic circuitry such as I2C, commonly found in newer Phillips appliances.

By using the TP techniques described here, you can actually: evaluate partial results within a sub, monitor some local/automatic var, object attribute or side-effect at a given spot, or even condition tests based on these intermediate results of a running test.

The Technique

To illustrate the method, we will be using Moose code as an example, although this technique can be applied to any and all Perl code.

As mentioned above, the technique is based on the use of callbacks at certain parts of the code which we have defined as Test-Points. The particular code example below, uses a single callback defined as an class attribute. In reality, you would probably protect this attribute with specific read and write methods, use configuration options, environment variables and so forth, but for the sake of simplicity, we will just let the testing code just use the pre-defined setter to map the single callback to a code reference in the test code itself.

Security Warning

Any potential hook in software can lead to unscrupulous use by crackers and malware. It could also lead to inadvertent use by a hacker to exploit some use for your TP not realizing that it should only be used for testing and no code should depend on the callbacks whatsoever. In the example below, you will notice that these callbacks are designed just to send scalar values in anonymous hashes, giving the testing software a chance to inspect values inside a sub, much like watches. Likewise, they could be also used bi-directionally to force certain behavior in a specific test, for example simulate some data inside the subroutine, but this is highly unrecommended.

test_class.pm

Setting up the Callback

This example Moose class implements the TP technique by setting up a single callback attribute (tp_callback), which is validated to be a code reference. In the BUILD method, we initialize the callback to simply return, making it innocuous unless someone points this code reference elsewhere. This has to occur at creation time (in the new() method of the class), or in the example can occur later on.

Security Note

In real life, you would probably want to limit this to creation time and wrap/condition the actual calls to the callback with a debug-level or diagnostic-mode flag of some sort, the same way debug messages are conditioned by level, as well as making the attribute private by means of specific read/write methods and other mechanisms. One way to do this in Moose, for example, would be to verify the debugging level (by checking a configuration parameter or environment variable), using a code block at the beginning of your class that modifies the META before creation, so basically, the whole callback mechanism is simply non-existent if the correct configuration is not set. An example of this is given at the end that exemplifies this.

Creating the Test Point

To create a TP all you have to do is call the global callback with two parameters (recommended practice, although the parameter list is up to you). The first parameter is the TP name, this will allow the testing code to identify which TP is being called. In the testing code, a simple dispatch table can be used to service the TPs which will dispatch to the servicing sub, using the TP name. The second parameter is optional and we recommend to be a single anonymous hash, containing key-value pairs of the things you want to inspect.

Of course, references could also be sent, for example to allow the test code to perform more elaborate actions, but this is not recommended, unless you really know what you're doing and there is no other way to accomplish this using the regular input of the method in question. The overall idea of the Test-Point is for diagnostics (see the inner workings of a sub and better or more granularly test a sub), though in some particular situations it might be useful to force certain things and analyze the behavior. For example, if something is not working as expected, forcing a value may allow the testing code to diagnose the problem and pin-point exactly what failed and why.

As can be seen, the TPs are basically innocuous unless the main callback referenced is pointed at some other code that actually evaluates the callback. This callback code will live in your test code, as you can see in the test code example below. In real life, you would probably avoid the callback altogether by wrapping it in a conditional of some sort, in combination with some create-time magic, to avoid the use of this mechanism by other code (see other security notes above, and the create time META manipulation example at the end).

package test_class; use Moose; has 'foo' => ( is => 'rw', isa => 'Str', ); has 'bar' => ( is => 'rw', isa => 'Str', ); has 'tp_callback' => ( is => 'rw', isa => 'CodeRef', ); sub BUILD { my $self = shift; # initialize the test callback $self->tp_callback(sub {return;}); } sub asub { my $self = shift; my $lvar_foo; my $lvar_bar; # some code that sets bar $self->bar('result'); # you want to test the value of bar at this point $self->tp_callback->('test_point_one'); # some code that sets a local vars $lvar_foo = 'yuca'; $lvar_bar = 'pelada'; # you want to test the value of lvar at this point $self->tp_callback->('test_point_two', { lvar_foo => $lvar_foo, lvar_bar => $lvar_bar, }); return 1; } __PACKAGE__->meta->make_immutable; 1;

test_class.t

The test code below implements the Test Points. The general structure of the test file is just like any other except that the callbacks are handled in an event-driven fashion, meaning that they will be called at any time during the execution of a regular test. You can think of this as a software/hardware interruption, and the TP subs are akin to the service routines of the interrupt.

This particular test example has four main sections. The first is just the basic declarations and common use_ok tests of any test file. The second, is the declaration of the dispatch table that maps the Test-Point names to their servicing sub-routines. The third is the standard tests you would usually perform on this class, and the fourth are the Test-Point service subroutines.

TP names should be unique, and could include for example, the sub name as part of their name, or could also be numbered with some special TP scheme such as those used in circuit boards. Also, bear in mind that a TP may be called by different tests, meaning that more than one test may invoke the callback. This can be addressed by conditioning the callback in the test code itself, or in the class code itself (i.e. if $x==y then invoke this TP).

The final implementation details depend very much on your your specific needs so you must adapt this technique accordingly. The technique described here is intended to be simple and introduce the subject of Test-Points by using Callbacks. Other options include: using multiple callbacks, or to be a bit more functionally purist, a code reference could be passed as a parameter of a subroutine call, effectively mapping the callback to a specific test call. This is commonly used in functional programming and well described in Mark Jason Dominus' great book "Higher order Perl".

#!/usr/bin/perl use strict; use warnings; use Test::More; BEGIN { use_ok 'test_class' } my $tc = test_class->new(); # the dispatch table my %test_points = ( test_point_one => \&test_point_one, test_point_two => \&test_point_two, ); # setup the callback dispatch $tc->tp_callback( sub { my $tp = shift; $test_points{$tp}->(@_); } ); # regular tests here cmp_ok($tc->asub(), '==', 1, 'Result of asub'); # callback test subs here (or in pm?) sub test_point_one { my $params = shift; #not used in this test point cmp_ok($tc->bar, 'eq', 'result', 'Value of attr bar at test_point_one'); } sub test_point_two { my $params = shift; cmp_ok($params->{lvar_foo}, 'eq', 'yuca', 'Value of lvar_foo at test_point_two'); cmp_ok($params->{lvar_bar}, 'eq', 'pelada', 'Value of lvar_bar at test_point_two'); } done_testing();

How to run the examples

Download the examples to the same directory and run with:

prove -v test_class.t

You should see something like the example results below using bash shell:

aimass@yclt2:~/languages/perl/MooseTest$ prove -v test_class.t test_class.t .. ok 1 - use test_class; ok 2 - Value of attr bar at test_point_one ok 3 - Value of lvar_foo at test_point_two ok 4 - Value of lvar_bar at test_point_two ok 5 - Result of asub 1..5 ok All tests successful. Files=1, Tests=5, 0 wallclock secs ( 0.05 usr 0.00 sys + 0.37 cusr + 0.02 csys = 0.44 CPU) Result: PASS

Avoiding TP abuse by Meta manipulation in Moose

If you are paranoid on possible exploits or inadvertent use of the TP technique (although beware that even Moose provides some hooks by itself), you could use more advanced techniques to protect against inadequate use. Nevertheless, the example given is more to prevent inadvertent use (avoiding that someone creates code that uses your callbacks for anything other than testing) than for actual security reasons.

cond_tp.pm

Note that the class is much the same as the original example above. Nevertheless, a code block at the top of the class will modify the class' meta information dynamically and avoid a programmer from using it incorrectly unless a specific debug level is set in the environment. To explain this code in detai is beyond the scope of this tutorial and to completely understand it please check the POD for Moose and Class::MOP.

package cond_tp; use Moose; use namespace::autoclean; has 'foo' => ( is => 'rw', isa => 'Str', ); has 'bar' => ( is => 'rw', isa => 'Str', ); # set-up Test-Point depending on debug level { my $debug_level = $ENV{'MYDEBUG_LEVEL'} || 0; my $meta = Class::MOP::get_metaclass_by_name(__PACKAGE__); # enable TPs at debug level 5 and higher if($debug_level > 4){ $meta->add_attribute( tp_enabled => ( accessor => 'tp_enabled', init_arg => undef, # prevent decl via new() predicate => 'has_tp_enabled', # inform test code that TPs are + enabled default => 1, writer => undef, # always read-only ) ); $meta->add_attribute( tp_callback => ( accessor => 'tp_callback', # default is rw predicate => 'has_tp_callback', #inform test code about callba +ck default => sub {return;}, ) ); } else{ $meta->add_attribute( tp_enabled => ( accessor => 'tp_enabled', init_arg => undef, predicate => 'has_tp_enabled', default => 0, # test points are disabled writer => undef, ) ); $meta->add_attribute( tp_callback => ( accessor => 'tp_callback', predicate => 'has_tp_callback', default => sub {return;}, writer => undef, # read-only ) ); } } sub asub { my $self = shift; my $lvar_foo; my $lvar_bar; # some code that sets bar $self->bar('result'); # TP conditioned $self->tp_callback->('test_point_one') if $self->tp_enabled; # some code that sets a local vars $lvar_foo = 'yuca'; $lvar_bar = 'pelada'; # TP conditioned $self->tp_callback->('test_point_two', { lvar_foo => $lvar_foo, lvar_bar => $lvar_bar, }) if $self->tp_enabled; return 1; } __PACKAGE__->meta->make_immutable; 1;

cond_tp.t

The test code is almost exactly the same as the previous one except for the conditional on setting-up the callback.

#!/usr/bin/perl use strict; use warnings; use Test::More; BEGIN { use_ok 'cond_tp' } my $tc = cond_tp->new(); # the dispatch table + my %test_points = ( test_point_one => \&test_point_one, test_point_two => \&test_point_two, ); # setup the callback dispatch only if enabled if($tc->tp_enabled){ $tc->tp_callback( sub { my $tp = shift; $test_points{$tp}->(@_); } ); } # regular tests here cmp_ok($tc->asub(), '==', 1, 'Result of asub'); # callback test subs here (or in pm?) sub test_point_one { my $params = shift; #not used in this test point cmp_ok($tc->bar, 'eq', 'result', 'Value of attr bar at test_point_on +e'); } sub test_point_two { my $params = shift; cmp_ok($params->{lvar_foo}, 'eq', 'yuca', 'Value of lvar_foo at test_point_two'); cmp_ok($params->{lvar_bar}, 'eq', 'pelada', 'Value of lvar_bar at test_point_two'); } done_testing();

How to run the examples

Download the examples to the same directory and run with:

prove -v cond_tp.t

Test the TP conditionals by setting the MYDEBUG_LEVEL to 5 and above. You should see something like the example results below using bash shell:

aimass@yclt2:~/languages/perl/MooseMeta$ export MYDEBUG_LEVEL=5 aimass@yclt2:~/languages/perl/MooseMeta$ prove -v cond_tp.t cond_tp.t .. ok 1 - use cond_tp; ok 2 - Value of attr bar at test_point_one ok 3 - Value of lvar_foo at test_point_two ok 4 - Value of lvar_bar at test_point_two ok 5 - Result of asub 1..5 ok All tests successful. Files=1, Tests=5, 1 wallclock secs ( 0.04 usr 0.01 sys + 0.40 cusr + 0.01 csys = 0.46 CPU) Result: PASS aimass@yclt2:~/languages/perl/MooseMeta$ export MYDEBUG_LEVEL=4 aimass@yclt2:~/languages/perl/MooseMeta$ prove -v cond_tp.t cond_tp.t .. ok 1 - use cond_tp; ok 2 - Result of asub 1..2 ok All tests successful. Files=1, Tests=2, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.39 cusr + 0.02 csys = 0.45 CPU) Result: PASS

Replies are listed 'Best First'.
Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks"
by ELISHEVA (Prior) on Jan 02, 2011 at 07:28 UTC

    I think you are more likely to get feedback if you move the code on your scratchpad to the body of your RFC. Just surround it by <readmore> tags so it doesn't distract attention from your introductory question. The comments here won't make any sense if you delete or modify your scratch pad contents at some point in the future. Posting the draft here will make it easier for people coming after you to learn from the dialog in these replies.

    The metaphor of a circuit board with test points appeals to me, but I'm having trouble envisioning situations where I would actually use this. I think a tutorial would be significantly strengthened by a more in depth discussion of when this is approach is most valuable, especially in light of alternative design, debugging and testing techniques.

    The main difficulty is that one needs to know in advance where one would want to have permanent test points. To make something part of the class itself, it needs to be something one is testing for long term reasons. That rules out testing motivated by "I'm not sure I understand how this Perl function/syntax works" and testing motivated by the hunt for logic errors.

    Once logic errors are fixed, there isn't much point in keeping the code used to find it sitting around. What one needs instead is preventive measures. An in-code comment discussing any tricky points of logic will hopefully keep someone from erasing the fix and regressing the code back to its buggy state. A regression test should also be defined as a double check to ensure that future maintainers do in fact read the comments and don't reintroduce the error.

    A permanent test usually looks at object state or the interaction of object state and the environment. This raises the quetion: is there a way you can design your object so that test points aren't needed, or at least needed less frequently? I suspect the answer is sometimes yes and sometimes no. A tutuorial should discuss the situations where design cannot eliminate the need for test points.

    I would tend to think that the times when one most needs permanent test points is when one needs to test state in real time during a process that interacts with a changing environment. For example, if a user complains that hitting a certain key causes the software to spew garbage, one might want to have them run the software in a debugging mode and see what output gets generated by embedded test points.

    However, if one is using test points as part of a static test suite, I'd suspect the need for test points is related more to the design than an actual need for a test point. Sometimes, if we are working with a legacy design, we may not have any option but to stick with it, so test points might also be helpful there. However, if one can come up with a design that eliminates the need for test points, I think that is good. Such designs tend to be more modular and easier to maintain.

    In my own case, I try to define objects so that any information I need to test the state of the object is publicly accessible at least in read only form. I also try to break down long complex functions into small testable units. I test each of the component steps and not just the entire mega routine. This reduces the need for intermediate test points within a routine, except when I'm trying to track down a logic error. As discussed above, tracking down a logic error doesn't justify a permanent test point.

    I also work hard to build my software from immutable objects wherever possible. Instead of littering each class with setter methods, I'll write a class that gathers input, passes it to a constructor and spits out the immutable object. The input gathering class will need careful tests on state, but the other objects can be checked after creation and then we're done. However, I wouldn't need a test point to check state because my state is normally exposed via the aforementioned readonly methods.

    Exposing object state via a readonly method, can't hurt the integrity of the data. The only possible downside is that some developers are fond of hunting down and using undocumented methods. Most likely if a method exposes private data I don't want it to be part of the public API because the organization of private data can change over time.

    Maybe there are better or actually more standard "best practice" ways of doing this?

    I don't know that there "better" techniques but there are related techniques. This technique bears some similarity to assertions. Assertions are also permanent additions to the code that let one potential peek at problem areas deep within the code. There are however two key differences - one that adds options and another that takes them away:

    • An assertion is a hard coded check. Your test point is a fill-in-the-blank check. Depending on your needs you can change its content.
    • An assertion is part of a method's code and so can check automatic ("my") variables private to the method. A testpoint subroutine can only check class /object state.

    You could, of course, modify your technique slightly and enable testing of automatic data. Just define your testpoint routine to accept parameters passed in by the method. However, that means that you would have to anticipate what parameters were being passed in and design separate test point methods for each combination of passed in paramters. At that point you are getting awfully close to the very problem you are trying to avoid - having to define and sprinkle special case debug/warning statements all through your code.

    You would still retain one benefit: the ability to cleanly encapsulate your debugging code. However, there are alternatives, especially for temporary debugging code. I often surround lengthy temporary debugging code with

    #DEBUG - BEGIN .... code here ... #DEBUG - END

    When I am ready to remove it or comment it out, I just grep my source files for #DEBUG.

    Update Expanded comments on design vs. test points.

      I think you are more likely to get feedback if you move the code on your scratchpad to the body of your RFC....

      Makes perfect sense, although I was trying this time to faithfully follow the instructions here Tutorials. But I will do as you recommend because it seems to make more sense.

      I really appreciate you have taken the time to write this. The circuit-board analogy is because I come from a Electronic Engineering background, but the inspiration and implementation techniques actually came from stuff I had picked up from Mark Jason Dominus' HOP.

      Anyway, the topic at hand is interesting because it's related to balancing design, testing, assertions (as you well pointed out), diagnostics and debugging. Much like in electronics, you run the tests (whether it's a test-bench or embedded self-tests) but if a board fails, you have to test the circuit at each stage to see what is going on. Newer circuits that use I2C actually perform very granular self-tests in an automated fashion and also auto-diagnose themselves (e.g. most modern Phillips appliances do this). So whether this is testing, diagnosis or debugging seems to be actually dependent on the granularity of the test, and ultimately in the eye of the beholder.

      A test will tell you that something failed, a diagnosis may dig deeper to help you pin-point the problem. In the Perl testing harnesses, you could probably accomplish this by conditioning more granular tests only if a higher level test failed (regardless if integral or unit test). More commonly as you point out, by unit testing each low-level component and then testing the aggregates, assuming of course, that the code is sufficiently factorized to accomplish this, and this is a again by design, as you also point out.

      Back to the circuit analogy: you may have a distinct overall function, say the horizontal drive in a TV set (maybe analogous to a class). This in turn might be divided into 3 distinct parts: a) the h-sync circuitry, b) the middle-stage drivers, and c) the horizontal transistor that drives the fly-back primary (a,b,c maybe analogous to methods/functions or could also be modeled as an association [i.e. simple, aggregation or composition] depending on design needs). The I2C self-test (analogous to integral or unit tests) may indicate a problem with the voltage feeding the middle stage drivers, but within that part of the circuit you have several other tests points (not visible to the I2C) that are used to further diagnose the problem by measuring voltages or by looking at reference wave patterns (analogous to manual debugging). As to how far/deep the unit tests go, is again by design, and design in turn is based on needs, so I don't think there is any definite correct design rule here. Some I2C tests are so particular, that will many times immediately tell you which particular component needs to be replaced, other times it just points out the circuit/section and you have to place your oscilloscope lead in the test-points to figure out which component is failing.

      So how far/deep the test must also go to diagnose the failure depends on the granularity of the code, which is a design constraint, in turn based on real-world needs. This may be similar to how much normalization is actually needed, or practical, in an RDBMS model, which many times depends on the real-world needs of that particular system. Furthermore, real-world performance issues may also force you to de-normalize an RDBMS model, or to rethink a complete layer entirely. For example, you may arrive at the conclusion the RBDMs suck at querying real-world objects, so you incorporate another element, say for example CouchDB, that better handles the de-normalized real-world object queries, completely separating the write and read paths. Now, before I divert too far OT, maybe I should start by explaining why this need arose in the first place, and perhaps this may shed some light on whether the test-point analogy makes any sense, or is of any use for Perl automated testing or not.

      This idea came to me because I was having a hard time debugging a long function that scores the matching of text fields using Wagner-Fisher. The function in question calculates the individual scores of several text fields which are compared to several text fields in several different groups (from an XML source). Then, the best scores from each group are selected and narrowed down to the individual best match of each group. The function itself is a helper of 2 other functions that take these results and select the best match based on the average score. There is no recursion, but a lot of sub-processing of XML data using XPath and iterating through the different groups to calculate the individual scores for each field in each group and feeding that result to the other functions that in turn aggregate the results and help narrow down to the best match. So you see, the code is already sufficiently factorized into 3 functions but the scoring function although long, makes little sense to break up into smaller functions, or to encapsulate in smaller classes (though with enough time and budget this may not necessarily hold true).

      The reason I implemented this debugging/testing technique is to make sure that we were scoring correctly at every step, and that when we add new logic, I could make sure that the programmer who later touched this code (or myself) would not screw up the not so complex but lengthy scoring algorithms. This is because previsions were left in the code for additions or modification of groups, fields, scoring rules and these have been changing since we've put the product through beta testing with key customers. I agree with you, this code is far from perfect, but this test point / granular testing (diagnosing or debugging) technique has proved very useful at this stage of development, and it's probably more a question of choosing the right title for it.

      Your comments, and this comment Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by swon made me re-think this if this is actually a testing or a diagnosing/debugging technique. In my particular case, I think it's both, and that may lead to some interesting discussions and conclusions here.

      On your comments in particular I think that you have a strong point that this could be avoided with better code design, and that probably holds very true. On the other hand, budget and time constraints many times don't allow to design every class and library perfectly up front, so we must all take a more iterative approach to this, and fine grain testing like this has proven instrumental in iterating and evolving this particular class.

      Another interesting fact is that many times architectural constraints don't allow for "ideal" modeling in any single paradigm, this case being a particularly good example. This library is a Model class for an application in Catalyst, so a basic design constraint is to try to leave as much code pre-loaded in a Catalyst "instance" and only create new objects that are specific to an individual HTTP request. This means that even though the overall design pattern is OO with Moose, the "instance" classes are more functional libraries than anything else. Also, we have to account for different deployment techniques such as mod_perl with pre-forked processes or using threads with mod_worker where the non-mutable data of the interpreter is shared amongst threads.

      In this case, 'ideal' object modeling would represent a huge performance penalty in having to instantiate objects with a lot of code, so the objects in this design are light-weight objects that have a per-request lifespan. The instance code on the other hand, has to make sure we don't have any global data (object attributes) that would create a problem when using the methods (functions) of these long-lived objects that are more akin to an OS shared object (aka dll). This of course does not excuse the fact that these model "instances" could benefit from better design choices to begin with, and I agree. Which reminds me of when I wrote RFC Mocking an Accessor with Test::MockObject (also inspired by test code of a Catalyst application) chromatic said "If you have to mock an accessor, you're probably mocking too much.", and he was right. Because after giving it further thought, I realized that it was better to completely separate my Model classes from those of Catalyst (eliminating much of the need to mock in the first place), and then integrate them back to Catalyst using a very thin layer of "Catalyst Model" classes. Of course, if I would have carefully RTFM on "Extending Catalyst", I would have noticed this recommendation clearly spelled out ;-). Then again, the mocking of accessors technique proved to be equally useful later on.

      At this point my conclusion is that a change of title and a bit of generalization might better classify this technique, although in the end it may prove not to be very useful after all, who knows. Maybe something like "Using Test-Points for Better Diagnostics", "Adding Diagnostics to your Tests using Test-Points", "Adding Granularity to your Tests with Test-Points", or something along those lines.

        There is no recursion, but a lot of sub-processing of XML data using XPath and iterating through the different groups to calculate the individual scores for each field in each group and feeding that result to the other functions that in turn aggregate the results and help narrow down to the best match.

        You sound like you might be limited in the amount of refactoring you can do, but there is one technique I'd like to share, just in case it might fit and you find some opportunity to do some refactoring.

        Oftentimes, when a function turns big long and ugly and it doesn't make sense to refactor, the chief culprit is a large amount of state data shared among the different steps of a function. The classic example of such a function is a giant loop that is pushing and popping a stack in lieu of recursion, but there are other examples as well.

        In these cases I have often found it quite helpful to design and build a functor object. This is an object whose data is all that ugly automatic data shared throughout the function (in your case, it might be something like the current state of your scoring variables). Then I define a set of small focused functions each reading and setting the shared data as needed. This eliminates the need to pass lots of data between methods, but still allows me to break the long ugly function into small conceptual chunks.

        Nearly every time I do this, I watch the algorithm and all its problem areas unfold before my eyes. When each chunk of the algorithm has the breathing room to live in its own subroutine, I often become aware of a set of edge cases and conditions that ought to have been checked for but weren't. Something about code living in its own little home seems to invite a closer look at the algorithm and all of its associated if, ands, and buts. Furthermore, there is no longer a concern about all of those hairy conditions clouding up the overall logic because they are nicely encapsulated in a subroutine.

        Another advantage of this approach it that it becomes much easier to run just a part of the algorithm. If you want to run everything, you call the functor's run method. Since little data is being passed from subroutine to subroutine, this "run" method starts looking like a list of steps. Depending on your granularity needs you can sometimes break the list into a part A, B, and C and make each of those a subroutine and then run just a part of the algorithm, do some tests, then run another part. This would be much much harder to do if A, B, and C were wired together by data passed from one to another via parameters rather than shared from within the object.

        Just a thought.

        Update: explaining how a functor object can affect the ability to stop and start a process.

Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks"
by zwon (Abbot) on Jan 02, 2011 at 07:44 UTC

    Interesting, but doesn't look good to me. The minimum unit of code I test is function. Function is some peace of code with well defined behaviour, I can completely rewrite it, but tests still will be valid. Your test testing particular implementation of the function, it is possible that somebody will rewrite function using different algorithm, and function will return the same result as before, but tests will fail, cause they rely on particular implementation.

    You can say that tests may be rewritten too, but it's not so easy. You expose some interface, and it is possible that some idiot use it to get incomplete result, and he rely on this functionality, so how could you know that you didn't broke anything?

    In my opinion it maybe useful for debugging, but not for testing.

Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks"
by roboticus (Chancellor) on Jan 02, 2011 at 16:05 UTC

    alt:

    I agree with the points that ELISHEVA and zwon raised.

    My primary objection is that it makes the object API less well defined. While there are exceptions, typically a method is "atomic". But with callbacks, objects may be altered in mid-operation causing interactions you won't be able to anticipate, and thus you can't create tests for beforehand. Also, normally innocuous changes that you'd make could break user code.

    Finally, if you don't consider the placement of your callback locations, algorithm improvements may become extraordinarily difficult, as you'd have to maintain the illusion that callbacks are executed at the appropriate time. For example, if you have an iterator class and it searches linearly through your list. It provides a "next item" callback for each item considered. If the container is sorted, then the user may expect that the "next item" callback will be called in ascending order. In your next version, you want to use a binary search to speed things up. But now the "next item" callback won't be executed in order, and you'll be skipping items. Your callback interface locks you into a linear search or you'll break someone's application.

    Is there some compelling scenario you've encountered that makes this better than simple static testing? When I was reading it, the best case I could come up with is to provide hooks for logging rather than testing. I'd be interested in seeing how you use it.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

    Update: added "and zwon".

      Thank you for your feedback.

      As I point out to ELISHEVA in a lengthy response above, I am using this technique specifically to test parts of a method, not at all to supply a call-back mechanism or pre/post/around hooks of any sort (which BTW are already provided by Moose). The idea is to inspect intermediate results in a single function, much like many times you leave a DB breakpoint, a commented warn statement, or a debug-level log statement, is probably because you anticipate having problems there in the future, or because these intermediate results may help you to debug this code if a test fails.

      Rationale: whenever you have the need to print a debug statement, or use the debugger, is is not better to write a sub-test case with this call-back technique?

      Of course this may only apply to complex functions that cannot be factorized into smaller chunks, for example long nested conditionals, etc. If you don't use these callbacks they don't really hurt and could even be wrapped with a 'diagnostic' testing mode for example. So you can run your basic tests, and if the fail, you could run a more thorough 'diagnostic' test that uses the callbacks to evaluate intermediate results.