Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

the sorry state of Perl unit testing framework

by bulk88 (Priest)
on Apr 07, 2015 at 06:51 UTC ( #1122667=perlmeditation: print w/replies, xml ) Need Help??

Updated: Test::Tiny was benchmarked and analyzed, Test::More alpha release 1.301001_101 with new backend tried

Soon I will be going to QAH 2015 in Berlin. My number one goal is to get parallel testing for TAP::Harness working on Win32 Perl but this post isn't about parallel testing. TAP::Harness stymied me many times trying to touch the codebase. It is militantly OOP, and obfuscated with its own implementation of method dispatch, and is written in a "declarative language" similar to a makefile. It has layers of faux-abstraction, that superficially purports to support pluggability, yet doesn't allow anything but the current implementation to fit the abstraction layers. I belive it is a pedagogical exercise in a ivory tower, which in plain terms, it is a homework assignment for appealing to a professor's ego that shows off all the skills itemized on your syllabus.

Researching the bloated design of TAP::Harness also brings me to investigate the other side of the TAP connection, Test::Simple/Test::More. It is equally inefficient I discovered.

All these tests were done a 2.6 ghz 2 core machine, running 32 bit WinXP. The primary tool I use for benchmarking in this post is called timeit.exe. It is a simples times() like benchmark tool that asks the NT kernel for its counters after the process exits. The resolution of these counters is 15.625 ms but the workloads I use to benchmark take seconds or minutes to complete, so 15.625 ms resolution isn't an issue.

The workload I use is always 1 million tests, the tests are run by GenTAP.pm, which is from http://github.com/bulk88/Win32-APipe. GenTAP.pm and fastprint.t, fasttinyok.t and fastok.t should be portable and run on any Perl platform if you want to try reproducing these benchmarks yourself.

I refrained from using nytprof in this write up since nytprof has overhead, and questioning individual subs, and lines of code, is pointless, if slowness is a conscious systemic design rule, not a couple bad drive-by patches over the years. The output is redirected to a file, since this way there is no overhead of the Win32 console in writing to STDOUT.

Test::More

fastok.t calls Test::More's ok() from version 1.001014, 1 million times in a loop with each test always passing, and a randomly randomized test name.
timeit perl t\t\fastok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 2:11 am, Saturday, April 4 2015 Elapsed Time: 0:02:00.671 Process Time: 0:01:59.234 System Calls: 3664127 Context Switches: 320686 Page Faults: 948528 Bytes Read: 3339101868 Bytes Written: 100048020 Bytes Other: 73765993

which leads to 0.000119 seconds to do 1 ok() call, or 0.1 milliseconds. This is signifigant, for each 10000 tests, 1 second of overhead. It also means, you can't run more than 10000 tests per second on per core no matter what you do. How often do you run "make test" and wait for it to finish and it feels like filling a gas tank, how slow is travis or whatever CI solution you use?

If your unit testing consists of code generated permutations of parameters to your module, 10,000 tests or 100,000 is very easily reachable on 1 software project/module. Some very popular CPAN modules do this style of code generated permutations of tests.

Now what could be the fastest possible TAP generation? "type file_of_tap.txt" or "cat file_of_tap.txt" is cheating. Under the same conditions, which is the best case scenario for the overhead of test to compare Test::More against, is a simple "print "ok ".(++$counter)." - $testname\n";" in a loop instead of "ok(1, $testname)", which is what fastprint.t does.
timeit perl t\t\fastprint.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 2:19 am, Saturday, April 4 2015 Elapsed Time: 0:00:02.156 Process Time: 0:00:02.109 System Calls: 21413 Context Switches: 6818 Page Faults: 14663 Bytes Read: 270297 Bytes Written: 60606608 Bytes Other: 6971963


0.0000021 seconds per the DIY ok().

0.000119/0.0000021 57x more time. FIFTY SEVEN times more CPU. To summarize, if you use Test::More, you might as well imagine there is an gig ethernet cable and a UDP socket between your TAP emitter and TAP consumer.

Now about memory use of Test::More. I modified fastok.t as such
unshift(@INC, '.'); require 't/t/GenTAP.pm'; require Test::More; #load but dont call anything in Test::More #we want runtime mem overlead, not loadtime, otherwise the require wil +l happen #inside GenTAP if it isn't done here system('pause');#sample memory GenTAP(0, 0, 'ok', 1000000); system('pause');#sample memory

before 3,828KB, after 397,828 KB, peak 397,840 KB.

(397840-3828)/1000000=0.394012 KB per test emitted. 400 bytes per test. What on earth is in those 400 bytes? My test name passed to T::M::ok() is always 42 bytes long. Lets round that to the next 16 byte win32 malloc boundary, 48+12(perl's win32 ithread malloc wrapper in vmem.h)+16(sv head)+8(svpv body)+4(SV * somewhere else)=88 bytes for storing the test name. Where did the other 300 bytes go? Why is Test::More saving the names of passing tests? showing off your skills in LOC per hour for your CV? writing job-for-life unmaintable code? The TAP parser is responsible for maintain TAP state, not the TAP emitter. A TAP emitter should have no memory increase between sucessive calls to ok().

Test::More has no competitors on CPAN except for Test::Tiny which makes no attempt at API compatibility but has a similar ok() sub. So using fasttinyok.t, which calls Test::Tiny's ok() sub 1 million times, I get
timeit perl t\t\fasttinyok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 7:42 pm, Tuesday, April 7 2015 Elapsed Time: 0:00:05.218 Process Time: 0:00:05.140 System Calls: 49612 Context Switches: 24005 Page Faults: 17396 Bytes Read: 146216 Bytes Written: 57859498 Bytes Other: 17639334
Test::Tiny's ok() is (5.140/2.109=2.437) 2.4x slower than my ideal DIY ok() implementation. which compared to Test::More's 57x slower, 2.4x is a rounding error. Test::Tiny is a real working CPAN module remember.

About Test::Tiny's memory usage, using the same breakpoint positions, before 3,008KB, after 3,028KB, 3028-3008=20 KB. 20000/1000000=0.02 bytes per test, which means unmeasurable small. Basically no increase in memory usage, unlike the 100s of MBs seen with Test::More.

Even a drop in replacement for Test::More that is 10x slower than the ideal DIY ok() implementation above, or in other words 4x slower than Test::Tiny, is still 5x times faster than Test::More. Just about anything is faster than a sloth pulling a wagon. Something needs to be done about Test::More, the entire perl community relies on it, and it is unworkably slow. Either a drop in replacement, or replacing all of the internals of Test::More with a simplied architecture.

I was told to try a alpha release (1.301001101) of Test::More which included a new backend which hoped to improve its performance. I will therefore benchmark it.
timeit perl t\t\fastok.t > t.txt Version Number: Windows NT 5.1 (Build 2600) Exit Time: 11:37 pm, Tuesday, April 14 2015 Elapsed Time: 0:02:10.859 Process Time: 0:02:09.375 System Calls: 2399091 Context Switches: 238722 Page Faults: 543275 Bytes Read: 3031284 Bytes Written: 103643867 Bytes Other: 87114348
The results are bad. 10 seconds more old stable 1.001014 Test::More, or 9% slower.

TAP::Harness

Onto the TAP consumer, TAP::Harness. For the next example, remember fastprint.t takes 2 seconds of CPU to print its 1 million tests. I dont think fastprint.t's process time is included by timeit.exe tool but with the numbers shown, 2 seconds is a rounding error if it is included.

C:\sources\Win32-APipe>timeit C:\perl521\bin\perl.exe "-MExtUtils::Com +mand::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0 +, 'blib\li b', 'blib\arch')" t\t\fastprint.t t\t\fastprint.t .. ok All tests successful. Files=1, Tests=1000000, 66 wallclock secs (65.86 usr + 0.13 sys = 65. +98 CPU) Result: PASS Version Number: Windows NT 5.1 (Build 2600) Exit Time: 3:00 am, Saturday, April 4 2015 Elapsed Time: 0:01:06.406 Process Time: 0:01:06.203 System Calls: 483839 Context Switches: 182749 Page Faults: 78950 Bytes Read: 62566241 Bytes Written: 53961404 Bytes Other: 11595189 C:\sources\Win32-APipe>

(60+6)/1000000 = 0.000066 seconds for TAP::Harness to process 1 TAP test. It is better than Test::More, with TAP::Harness taking, for parsing 1 test, 55% of the time it takes, for Test::More to emit 1 test.

Now about the memory usage of TAP::Harness. Checking the process memory with Task Manager with a breakpoint right before "test_harness" sub is called, shows 5,868 KB, Windows OS shows the process peaked at 106,368 KB, and a breakpoint right after test_harness sub, shows 96,636 KB. There are 2 memory problems here that need to be broken down.

Problem 1, TAP::Harness uses about 100 bytes of memory for each *passing* test ((106368000-5868000)/1000000). The internal state of tests results isn't a sparse array or linked list of failed tests, or a vec() or C bitfield, heck, it isn't even a @array with undef/uninit slices for successful tests, which would be 4 bytes per test on a 32bit OS. It is 100 bytes per passing test. What is 100 bytes? 100/4 is 25 pointers/slice members. For reference, each scalar you create, on 32 bit OSes, is 4 slice members. And I will guess it uses a blessed hash with 1 or 2 hash keys for each test without even looking at TAP::Harness's implementation.

Problem 2, in the breakpoint, after test_harness() was executed, memory dropped from 106,368 KB to 96,636 KB. Only about 10MB. What is inside the 96,636-5,868=90,768KB?

Here is the console log with the breakpoints (the "Press any key to continue . . ." lines) to show where memory was sampled.

C:\sources\Win32-APipe>C:\perl521\bin\perl.exe "-MExtUtils::Command::M +M" "-MTest ::Harness" "-e" "undef *Test::Harness::Switches; system 'pause'; test_ +harness(0, 'blib\lib', 'blib\arch'); system 'pause';" t\t\fastprint.t Press any key to continue . . . t\t\fastprint.t .. ok All tests successful. Files=1, Tests=1000000, 65 wallclock secs (65.05 usr + 0.11 sys = 65. +16 CPU) Result: PASS Press any key to continue . . . C:\sources\Win32-APipe>
Notice
All tests successful. Files=1, Tests=1000000, 65 wallclock secs (65.05 usr + 0.11 sys = 65. +16 CPU) Result: PASS

was already printed, so what do those 90 MB contain? Why is TAP::Harness holding onto state after printing the final line? surely this can't all be malloc fragmentation preventing a release of memory? Or was TAP::Harness written to leak memory since "the process will exit soon, dont waste CPU freeing anything", perhaps a legitimate reason? I doubt that was the intention of the person who designed TAP::Harness's OOP API.

In combination, TAP::Harness+Test::More take 0.185 ms per ok() call, of overhead. is() which is more common than ok(), will probably take even more than 0.185ms, so 0.185 ms per test, is the current best case scenario using the existing unit testing framework.

Solutions

Test::More

Rewrite Test::Simple and Test::More back into standalone modules like they were 20 years ago, and remove their usage of Test::Builder would be best, but that requires Test::More's authors and maintainers to agree that the code is deeply flawed and agree to replacing it.

Summarizing some code in Test::Builder, do we really need to ever implement this, anywhere?
sub in_todo { my($todo, $caller); $todo = defined((caller++$caller)[0])?(caller$caller)[0]->can('is_ +todo')?(caller$caller)[0]->is_todo?1:undef:undef:0 until defined $tod +o; $todo; }

Since I expect signifigant protests from the peoples whoses CVs depend on protecting their precious snowflakes (see this incredulous post https://github.com/Test-More/test-more/issues/252 from 2012, leaking memory is by design and will never change by the then author), a drop in replacement for Test::More under a different namespace and patching dists away from T::M/T::B, and removing T::M/T::B from Perl 5 core is probably the easiest way forward.

TAP::Harness

Rewriting TAP::Harness from scratch is probably the only solution, since a couple 3rd party modules are crazy enough to integrate themselves, like TAP::Harness::Archive. The typical "make test" has no use for TAP::Harness OOP bloat, with the only 2 options being TEST_VERBOSE on or off, and parallel or not.

I have done nytprof-ing of TAP::Harness, but nothing is fixable there without admitting all the design rules are a list of what not to do.

A simple design for a new harness would be, a TAP source class (usually a process class) that returns a stream class. The stream class returns a string name() (filename of .t, or a http:// URL of TAP or a disk path of TAP), and returns multi-KB blocks and eventually undef as EOS indicator, just 2 methods. For passing tests store nothing (undef), or store a "sparse" range of passing tests unblessed objects, store failed tests and unknown lines and diag in a linked list for dumping/summing at the end. Rewrite the parser in XS to quickly search for newline and "not ok" and "ok" token, maybe use BM algo. Even for a PP version, index and substr for a 1 pass parser through the block. For passed tests, store nothing. If a TAP stream has all passing tests, and reaches the end of stream, all the passing tests are represented by 1 hash with 2 keys (start range, end range). This is a long shot, but ideally, pipes shouldn't even be used between a TAP consumer and TAP emmitter. Future Test::More can communicate through shared memory through XS to future TAP::Harness, with the IPC buffer maybe looking like a stream, or the TAP "parsing" (no TAP is generated) is "done" (there is no TAP, just an array of test records) on the client side in Test::More, which also gives the benefit of "out of sequence" TAP being impossible due to perl thread safety in Test::More::XS.

Why is TAP::Harness's design flawed? I saw all of this being done with stepping and nytprof inside TAP::Harness.

Things not do include, no method accessors, callbacks, method dispatchers in PP, declarative syntax, no pluggable tokenizers, no roll your own RTTI, not RTTI at all, just 3 classes, and absolutly no "Harness::Data::Integer" class since Perl isnt Javascript, and will not JIT your Integer class into a scalar, always use hash keys, do not use classes and constructors if you can use an integer bitfields or integer bools/!!0, this is Perl, not C++, not Java. Do not build bool methods that should be bitfields, that are aggregations of other bool methods, since you wind up exponential method calls, and there is no caching since caching is evil since it cant be plugged later on according the OOP dogma, so the result is to parse a TAP line over and over. Do not use "class factory" classes. There is no reason to have closured, dynamically generated, anonymous classes. If you need 2 classes, both named "Record" since you are too incompetent to prefix "Customer::" or "Inventory::" to the word "Record", you should name classes after your pets, I hope your house doesn't have 2 Rustys. Perl has "packages", do not invent our own. Do not nest hashes inside hashes. Perl hashes aren't C structs where all the "."s in "pointer->a.b.c.d" optimized away. Do not write classes, where the ctor, does nothing except blesses an empty hash, and then each method of the class checks if the object was "really ctored" and conditionally calls the real ctor, in some crazy attempt to optimize for bad callers that unnecessarily create objects, never call a single method on them, then dtor them. Do not ask an object if it can() do anything, that means your objects are lying as to their abstract base class. If you bought a car on ebay, and the seller mails you 4 tires with a $1K shipping charge, do you timidly do nothing and buy another car online? Do not implement has_pending() with return !!scalar(shift->pending()); where pending() is return map{ref($_)->new($_)} @{shift->{queue}}. Do not implement meaningless sort calls, such as in $new->{$_} = $self->{$_} foreach sort @clone;. Do not collect error/fail state diagnostics, build error message strings, when there is no error, and you will never use that error message or state snapshot again.

For archival reasons, since Win32-APipe is a temporary name, and will be deleted soon. GenTAP.pm.
use strict; use warnings; use constant UVMAX => 2**32; #%Config doesn't have this, this will do use Time::HiRes qw( sleep ); #void GenTap($time_per_sleep, $sleep_count, $emitter, $test_count) # time_per_sleep - # seconds to sleep when a sleep is triggered, floating point okay and + will sleep # fractions of a second (uses Time::HiRes) # sleep_count - # integer count of number of times to sleep, the sleeps will be evenl +y # distributed between tests # emitter - # ok - use ok() # print - use perl core print() # block - use perl core print with atleast 4KB of multiline output p +er print # test_count - # number of tests to run, will be interleaved with sleeps if applica +ble sub GenTAP { die "usage: GenTAP" if @_ != 4; my ($time_per_sleep, $sleep_count, $emitter, $test_count) = @_; die "invalid test count" if $test_count >= UVMAX; my($testnum, $tests_before_sleep, $tests_left_before_sleep, $buffe +r, $ok) = (1, UVMAX, UVMAX); #emmit a plan if($emitter eq 'ok') { require Test::More; Test::More::plan(tests => $test_count); $ok = \&Test::More::ok; } elsif($emitter eq 'tinyok') { require Test::Tiny; Test::Tiny->import(tests => $test_count); $ok = \&Test::Tiny::ok; $emitter = 'ok'; # less branchin in main loop } else { my $plan = "1..$test_count\n"; if($emitter eq 'print') { print $plan; } elsif ($emitter eq 'block') { $buffer = $plan; } else { die "unknown emitter"; } } $tests_left_before_sleep = $tests_before_sleep = $test_count / $sl +eep_count if $sleep_count; while ($testnum <= $test_count) { # UC 1 random character in each test name for a touch of crazy # and test output that isn't identical each time, UCing a space produc +e no change # which is intentional, since sometimes the line should not be "typo-e +d" my $testname = "All work and no play makes Jack a dull boy"; my $replace = int(rand(length("All work and no play makes Jack + a dull boy"))); substr($testname, $replace, 1, uc(substr($testname, $replace, +1))); if ($emitter eq 'ok') { &$ok(1, $testname); } else { my $testline = 'ok '.$testnum.' '.$testname."\n"; if ($emitter eq 'print') { print $testline; #} elsif ($emitter eq 'block') { } else { #checked above $buffer .= $testline; if(length($buffer) > 4096) { print $buffer; $buffer = ''; } } } sleep($time_per_sleep), ($tests_left_before_sleep = $tests_bef +ore_sleep) if --$tests_left_before_sleep == 0; $testnum++; } #flush the buffer in block mode print $buffer if $emitter eq 'block' && length($buffer); } 1;
fastprint.t
unshift(@INC, '.'); require 't/t/GenTAP.pm'; GenTAP(0, 0, 'print', 1000000);
fastok.t
unshift(@INC, '.'); require 't/t/GenTAP.pm'; GenTAP(0, 0, 'ok', 1000000);
fasttinyok.t
unshift(@INC, '.'); require 't/t/GenTAP.pm'; GenTAP(0, 0, 'tinyok', 1000000);

Replies are listed 'Best First'.
Re: the sorry state of Perl unit testing framework
by TGI (Parson) on Apr 07, 2015 at 18:55 UTC

    I am much less worried about speed and more concerned with what a pain it is to write test modules with the current infrastructure due to its monolithic design.

    In my naive world view, tests should emit TAP, test runners should manage test processes and collect the TAP, and test frameworks should tell the runners what to run and process the TAP and produce reports.

    To that end I have been working on a TAP::Producer class for a while. Feel free to check it out and see if it serves your needs https://github.com/daotoad/TAP-Producer


    TGI says moo

Re: the sorry state of Perl unit testing framework
by BrowserUk (Pope) on Apr 08, 2015 at 20:29 UTC

    Concise, clear and utterly logical appraisal of the problem.* ++

    Unfortunately, as I found out to my cost a few years ago when I meditated on the same subject: Testing methodology, best practices and a pig in a hut., when you dare to question the status of their Holy Cows ("precious snowflakes"**), they'll either:

    • dissemble, evade, prevaricate, confound, confuse, distract, obscure, subtly misrepresent and willfully misunderstand until you either give up; or (in my case) respond with frustration; and then they'll take the "moral high ground".

      Basically, they'll do anything -- except discuss with logic -- to shut you up.

    • Or, as seems to be the case here; simply ignore you.

      Bury their righteous heads in the sand and hope you'll go away.

    The only alternative I can see for the problems you describe is to write a complete and distinct alternative to the steaming pile of O'WoE ("pedagogical exercise in a ivory tower"**) that is Test::*.

    To that end; I humbly offer Assert.pm which might form the basis for the replacement for Test::Simple/More that would remove a dozen or so layers of that "faux-abstraction"** that make those modules the poster child for the art of obfuscation-through-over-structuralization. (If you decide to; and need help?)

    't'd be nice to think that Perl's pragmatism would prevail here; but I wouldn't hold your breath.

    **Shame you're on the "wrong OS" to be taken seriously!

    *Love your turn of phrase!


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: the sorry state of Perl unit testing framework
by dmitri (Priest) on Apr 08, 2015 at 18:45 UTC
    An excellent rant, bulk88. I did not get the ivory tower/CV references: I guess you're throwing stones at someone?

    I will pick up the "militantly OOP" expression -- haven't heard that one before. :)

Re: the sorry state of Perl unit testing framework
by wollmers (Scribe) on Apr 09, 2015 at 22:42 UTC

    Sometimes it is better to start from scratch, without the limits of an hysterically (no typo;-) grown API, thus under a new name.

    Test::Tiny looks nice. It looks more like the minimal coding I like. Fight the bloat.

    cu @ QAH

    Helmut Wollmersdorfer

Re: the sorry state of TAP::Harness
by Anonymous Monk on Apr 07, 2015 at 07:20 UTC

    Yeah, please rename this thing

Re: the sorry state of Perl unit testing framework
by Anonymous Monk on Apr 09, 2015 at 21:32 UTC
    The maintainer's 2012 response at https://github.com/Test-More/test-more/issues/252 says:

    "Test::Builder (what Test::More is on top of) retains test history for the Test::Builder->details method, so that's expected if not always desired behavior. There is currently no way to turn this off.

    Test::Builder 1.5 will not retain complete history by default, only statistics, and memory usage will remain flat."

    (Italics mine.)

    According to Test::More in serious need of review, be afraid!, Test::Builder is about to be deprecated as part of a wholesale rewrite of Test-*.

    Does the new setup suffer the same problems?

Re: the sorry state of Perl unit testing framework (drop-in replacement)
by Dallaylaen (Friar) on Dec 30, 2016 at 21:21 UTC

    Hello, bulk88!

    If you're still interested in this topic, you might find the following exercise called Test::Refute interesting. Right now it seems to be crystallizing into something very similar to what you're proposing: a cleanroom Test::More counterpart, easily extensible, not keeping passed tests, and also much faster than the original.

    It's based on powerful $contract->refute( $condition, $message ); primitive. The $contract is an object storing information about (un)met conditions. Or outputting it right away as TAP, if needed. The $message is a human-readable name of the test. The $condition is false if everything is fine and the reason for failure otherwise.

    One can think of it as of a program exit code that is always 0 on success, but different for different failure modes.

    So an example "is" implementation looks like follows (real one is a bit longer though, undef handling, quoting etc.):

    sub { return $_[0] eq $_[1] ? "" : "$_[0]!=$_[1]" };

    Suddenly this is a pure function! We may need to provide a prototyped exportable wrapper for convenience, but that's left to my builder class. But no builder is required for making test logic.

    I'm hoping to go beta around mid-January.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1122667]
Approved by Corion
Front-paged by BrowserUk
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2017-12-15 09:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (425 votes). Check out past polls.

    Notices?