Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Testing my tests (mutation testing)

by szabgab (Priest)
on Feb 26, 2017 at 13:52 UTC ( #1182876=perlquestion: print w/replies, xml ) Need Help??

szabgab has asked for the wisdom of the Perl Monks concerning the following question:

Devel::Cover can easily show if a certain function or expression was executed during the test run, but it cannot tell if there was an assertion checking the validity of the result.

One could randomly change the code under test (e.g. replace a + by a - ) and run the tests again. If they still pass, we have a problem. The tests do not check that code properly.

Is there a tool for Perl that would automate this process?

For further clarification, I'd like to change the source code of the module or application under test and leave my tests unchanged.

Update: Use Case

To further elaborate assume you have a huge code-base with a huge test suite. You pick a module that has 100% coverage and wonder can I safely refactor this? Will the test suite catch if I make a mistake? Lanx gave an excellent and very simple example of code with a problematic test
sub foo { my ($x, $y) = @_; return $x + $y; }
The test:
is foo(2, 0), 2
How can I estimate the risk of changing this code? One possible way is to change the code in a way that should break it and see if the tests fail. So I'd have a tool that can introspect the source code of my application and change the code at a random place. e.g. it would change the + in the above function to -. (No mocking, really changing the code on the disk.)

The tests would still pass.

This is an indication that the tests don't protect me at that point.

Answer

Apparently this is called "mutation testing" and there are several modules on CPAN that do this:

See details under this node: Re: Testing my tests

Replies are listed 'Best First'.
Re: Testing my tests
by dagfinnr (Initiate) on Feb 27, 2017 at 07:30 UTC
    Maybe I'm out of my depth here, since I have no practical experience of it, but isn't that what's known as mutation testing? Searching CPAN, I find this: https://metacpan.org/pod/Devel::Mutator

      Yes, exactly, this is mutation testing. It's been on the Devel::Cover TODO list for years (item 7 at https://st.aticpan.org/source/PJCJ/Devel-Cover-1.23/docs/TODO). It doesn't have to be done as a part of Devel::Cover, but the advantage to that approach is that Devel::Cover knows which tests exercise the ops being changed and so can only run those tests. Otherwise the whole thing can get even more expensive than it already is.

      This disadvantage to that approach, obviously, is that no one has done it.

        That's it. I was looking for those. Thanks (all of you) for pointing me to the right name and the right modules.
Re: Testing my tests
by stevieb (Canon) on Feb 26, 2017 at 15:59 UTC

    If I'm understanding correctly, you're wanting something that can introspect the tests themselves, and make sure that you're asserting on the actual return value of a sub or not. Correct?

    I think that may be pretty involved to do, as after some quick thought, you'd need to introspect the test file itself, map a test with the sub in question, and validate whether *all* return paths are asserted against:

    # module sub perform { my ($x, $y) = @_; if ($x < 10){ return $x + $y; } if ($x == 10){ return $x * $y; } return $x - $y; }

    Then:

    is perform(5, 5), 10, "..."; is perform(10, 5), 50, "..."; is perform(20, 5), 15, "...";

    So it would almost seem as though you'd need to actually write tests against your tests. If there is an existing solution for these types of things, I'd definitely be interested in knowing about it as well.

      Boolean conditions are handled by Devel::Cover , it tells you whether you tested the TRUE and FALSE branches of them.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Right, but I think what he's asking about is whether the return values themselves have been tested against directly. I just put in the if statements to show that to do what he wants, some serious thought would have to be put into the introspection to ensure that if you could prove that an assertion has been made, it would also have to be confirmed which return path the return came out of.

      So it would almost seem as though you'd need to actually write tests against your tests.

      Won't he then need some tests to test the tests he uses to test his tests?

        and then a guard to guard the tests against change.

        Premature optimization is the root of all job security
      No. I would like to introspect and change my module or application and then run the unchanged tests again. (Updated the original post too with this clarification.)

        What about running the tests, then modifying the sub, then running again? Here's a quickly thrown together example (in this example I simply mock out the whole function) to see if this is more along the lines of what you're looking for:

        package Package; { sub perform { return $_[0] + $_[1]; } } package main; { use Mock::Sub; use Test::More; tests(); my $m = Mock::Sub->new; my $changed = $m->mock('Package::perform'); $changed->return_value($_[0] - $_[1]); tests(); done_testing(); sub tests { is Package::perform(5, 5), 10, "5 + 5 = 10"; } }

        Output:

        ok 1 - 5 + 5 = 10 not ok 2 - 5 + 5 = 10 # Failed test '5 + 5 = 10' # at pack.pl line 24. # got: '0' # expected: '10' 1..2 # Looks like you failed 1 test of 2.

        If this example is more along the lines you're after (modifying code on the fly), my Devel::Examine::Subs is designed to alter code within a file (so say you wanted to modify a single line in a single sub within a package, you could (then revert it back), and I could write a mock up example of what it may look like. But perhaps I'm way off here.

Re: Testing my tests
by LanX (Archbishop) on Feb 27, 2017 at 01:15 UTC
    Hi Gabor,

    It would be much easier to answer if you provided an example.

    What you say sounds a lot like automated proving, which is impossible.

    You may want to try generating random input for your unit tests, but then you need to manually approve if the returned results are correct and should be frozen into your tests.

    For this to work you need to know which and how many arguments you expect. (Including global context and side effects)

    Not sure if this is possible.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      "You may want to try generating random input for your unit tests..."

      We all do this, by default, don't we? To further, I don't think tests can be (easily) set up to be tested by other tests, but I digress.

      A sub should be tested thoroughly so that the return is tested explicitly. This should just be part of the default test regimen. If a new return path is created, it should be added, and tested all the same.

      Unless there's some form of AI going on to monitor changes to the sub itself, to see whether the return has changed (or a new return path created), I don't think it's feasible to do what is being asked here.

      I suppose a test involving PPI could be used to monitor the structure of the sub itself, but that's going pretty deep (and delves into my comment about hacking live-files live-time above).

      Either way, I'd love to see something like this if it's ever presented.

        > We all do this, by default, don't we?

        manually yes, not generated.

        Szabgab only hinted about tests that handle code like $a+ switched to $a-

        so in the following case

        sub foo { my ($a,$b) = @_; return $a + $b; }

        testing foo(*,0) (with * any number) will always pass, even if you change + to -

        If several pairs of values were generated by a random generator and the result was approved manually, then the likelyhood of a false positive would be minimal.

        And new test input could be generated quickly, depending on the manual decision if the code change was really intended or just a bug.

        As I said to (semi-) automatize this one needed to parse already a lot of the functions body.

        Anyway I don't want to elaborate more, without further explanation by the OP. (the requirements sound paradoxial)

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

        I don't think tests can be (easily) set up to be tested by other tests

        To test a test requires replacing what the test is testing with the test-tester.

        For tests that call a function in the app or module being tested, the called function would be replaced a "stub" function that does 2 main things: Check and log the supplied inputs, then return a result. The result could be completely random, randomly chosen from a list or the next in sequence (which would require the test-tester preserve it's current state).

        For tests, like the above, where the tests are "driving" the testing, it will be necessary to run the test-under-test enough times to insure full coverage of the test-under-test.

        For test where the app or module under test is driving the testing, test-tester will be able to perform the test as many ties as needed.

Re: Testing my tests
by Monk::Thomas (Friar) on Feb 27, 2017 at 15:20 UTC

    You pick a module that has 100% coverage and wonder can I safely refactor this?

    Is this a rhetorical question? Code coverage does not imply 'behaviour coverage'.

    Will the test suite catch if I make a mistake?

    If you violate currently specified(1) behaviour: yes. Otherwise: no. In other words: Must assume 'no' unless proven otherwise. If the tests were developed using TDD methodology then you ~might~ be safe. But I definitely wouldn't rely on that. It's easy to change behaviour even without adding a condition, e.g. squaring a number introduces an 'invisible(2)' branch for positive and negative numbers.

    (1) specified = there is a testcase for this
    (2) invisible for coverage analysis

Re: Testing my tests
by RonW (Parson) on Feb 28, 2017 at 00:39 UTC

    For an arbitrary function foo, to test it, almost always, several tests are needed. The general idea being to supply in-range and out-of-range values for each of the inputs to foo. Is the simplest cases, each input has a defined valid range and you make sure that, at minimum, each input gets values above the maximum, below the minimum, just below the max, just above the min and 1 or more points distributed between max and min. Obviously, as the number of inputs increases, the combinatorics will quickly get impractical, so you have to come up with a reasonable and workable subset of tests.

    In more complex cases, the (valid, in-range) value of one input can affect the valid range of another input. This is dependant on the intended behavior of foo

    But, even with a well engineered test suite, it's still possible the tests won't protect you from changes to even a very thoroughly tested function. It's still possible to miss corners (and other nexi) of the n-dimensional hyper-box.

    Even assuming your existing test suite is well engineered, you are very likely to find that it won't protect you from code changes, however reasonable they are.

    When looking at the code and planning a change, think about what might go wrong, then create more tests to cover those. Then after a reasonable number of changes have been made, hold a code review. Even if you have only yourself to do the review. Sometimes, after a few days and working on (or looking at) other code, you can go back and be able to see problems you couldn't when you made the changes.

Re: Testing my tests
by Anonymous Monk on Feb 28, 2017 at 00:00 UTC

    "Devel::Cover can easily show if a certain function or expression was executed during the test run, but it cannot tell if there was an assertion checking the validity of the result."

    It doesn't have to, that is not its job. That is the job of the test itself. You seem to be unclear on how this works: you write tests that call your interfaces and they confirm that the return value is what you expect. Code coverage shows you which branches your tests did not cause to be executed. Don't overthink things.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1182876]
Approved by Athanasius
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2020-06-04 15:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (33 votes). Check out past polls.

    Notices?