Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Where should (or could) a distribution override HARNESS_OPTIONS?

by davido (Archbishop)
on Nov 22, 2012 at 18:32 UTC ( #1005169=perlquestion: print w/ replies, xml ) Need Help??
davido has asked for the wisdom of the Perl Monks concerning the following question:

I think that increasingly we're seeing people running distribution test suites in parallel by setting the environment variable HARNESS_OPTIONS=j<n> or by executing prove -j9. chromatic discusses it in his blog here: Parallelism in Test Suites. In that post, and in the discussion that follows, module authors are encouraged to be mindful of parallelism in testing so that in the vast majority of the cases where parallelism is possible module tests are designed to facilitate that capability. An example is given where two test scripts depend on the same clean slate, and where running in parallel they stomp on each other's sandbox. In most such cases, fixing the assumptions or creating independent resources for each test script is a reasonable fix.

In a followup to the same post, aristotle makes the following statement:

Making a test suite not break under parallelism doesn’t necessitate making it run in parallel. However I believe something like say 95% of tests on CPAN will already run fine in parallel with no further ado, and of the rest, easily the majority will be very simple to fix.

In the quasi-infinitesimal remainder of cases, sure, if the effort is not worth it, just forcibly serialise the tests and move on.

I expect a push to test parallelism to require little housekeeping effort all told. There just needs to be a reliable pressure that steers the CPAN towards it.

This node is about the quasi-infinitesimal remainder of cases. syphilis maintains Inline::C, and I maintain Inline::CPP. Those modules both depend on a C/C++ compiler to do much of the heavy lifting. I don't know about all popular C/C++ compilers, but I have found that gcc doesn't seem to support parallel compiling. If two test scripts cause the C/C++ compiler to be invoked at the same time, we get a test failure. I don't think our chances are strong for getting that fixed. So the question arises, how do we, as aristotle suggests, "just forcibly serialize the tests and move on"? syphilis and I have been discussing this issue between us, and could use some enlightenment.

One thought is for each test script to set $ENV{HARNESS_OPTIONS} by stripping out the j<n> flag, and then cleaning up any ':' mess in the case of multiple flags having been set. But that won't work because by the time a test script is executed it will already be running in parallel with others; it's too late to affect the harness.

My original thought was there might be a way for Makefile.PL to cause make test to override any HARNESS_OPTIONS setting, but I'm at a loss as to how to accomplish this.

Another approach might be for Makefile.PL to detect the HARNESS_OPTIONS flags, and complain loudly before dieing. At least then someone installing a module will know why he's getting a failure (otherwise, the failures can be pretty opaque).

Does anyone know how best to deal with this sort of situation?


Other resources: Test::Harness, Controlling Test Parallelism with Prove.


Dave

Comment on Where should (or could) a distribution override HARNESS_OPTIONS?
Select or Download Code
Re: Where should (or could) a distribution override HARNESS_OPTIONS?
by afoken (Parson) on Nov 22, 2012 at 20:05 UTC
    [...] but I have found that gcc doesn't seem to support parallel compiling. If two test scripts cause the C/C++ compiler to be invoked at the same time, we get a test failure. I don't think our chances are strong for getting that fixed.

    That sounds really strange. When I run make -C/usr/src/linux -j5, several instances of gcc can run in parallel without trouble. So, what is different in Inline::C and Inline::CPP? (I really don't know. The Inline::* modules are still on my TO DO list.) Trivial question: Does each test have its own set of configuration files, gcc input files, gcc output files, gcc temp files?

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks for pointing this out.

      You could very easily be right with respect to gcc being allowed to run in parallel. That means we've got another problem. I can't speak for Inline::C's test suite, but I can say that no Inline::CPP test is intentionally reliant on any other test. And to answer your question, Inline::CPP generates within an _Inline directory a new subdirectory for each build's files. Nevertheless, there must be some resource that is getting clobbered, and I should be investigating that instead of looking for a means of preventing parallel testing. More research needed in that area, it seems.

      Let's put the Inline::CPP example aside for a moment then, and let the original question stand without any strong example of a module that fits that infinitesimally small category. :)


      Dave

        Inline::CPP generates within an _Inline directory a new subdirectory for each build's files.

        What is a "build" in this context? I.e. do we have one subdirectory per invokation of a test script, one subdirectory per test script, or one per Inline::CPP version? How are the names of the subdirectories calculated (what are the input parameters for the function that returns the subdirectory name)?

        Idea behind the last question: two test scripts, both use a common module that uses Inline:CPP, subdirectory name depends only on that module. Two gccs fight in the same subdirectory.

        (I should really install some Inline::* modules. But it's 6:30 am and it will be a long day at $work. No time for fun ...)

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Where should (or could) a distribution override HARNESS_OPTIONS? (flock)
by tye (Cardinal) on Nov 22, 2012 at 22:31 UTC

    I agree with afoken in being surprised at the thought of gcc not working properly just because it was run multiple times in parallel. I could see that causing a problem if both instances were trying to write to the same file, say, a.out, for example.

    I would be surprised if a test script can impact the behavior of the invoking test harness without the test harness having explicitly set up a method for passing control information in that direction.

    But you don't have to prevent the harness from trying to run tests in parallel. You just have to prevent the tests from simultaneously doing conflicting things at the same time.

    The simplest way I see of doing that is having a sentinel file (pre-created as part of the distribution) for each unsharable resource. The parts of each test that need exclusive access to that resource can 'flock' the file for the duration of their use of that resource.

    - tye        

Re: Where should (or could) a distribution override HARNESS_OPTIONS?
by tinita (Parson) on Dec 09, 2012 at 22:14 UTC
    I don't know how to influence make test, but if it's possible: How about adding a field to META.yaml saying if the distribution is capable of parallel testing or not?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005169]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2014-12-19 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (82 votes), past polls