http://www.perlmonks.org?node_id=945797

fullermd has asked for the wisdom of the Perl Monks concerning the following question:

So, a while back, I wanted to pull stats from my cable modem, so I wrote Device::CableModem::Zoom5341. I want it to keep working in the future, so I wrote tests for it too. And even better, CPAN has that whole infrastructure to run the tests across all sorts of platforms and tell me what goes on. So far, so good.

What's not so good, though is when tests fail. Well, when they fail because I broke something, that's good because I find out about it, but when they fail because of external factors, it's just noise to me, and unhelpful to other people looking at it.

That's happening in this case. On the plus side, I know why. Even better, I know how to fix it so it never happens again. But, I'd kinda rather not, so I'm hoping somebody has an idea for a third option.

The quick overview is that the module works by grabbing a HTML page via HTTP from the modem and scraping it. The test causing the problem is faking a HTTP fetch to test that part of the process.

Naturally, there's a test that actually runs the whole stack, all the way down to an actual cablemodem. But I know people might want to run the test suite without having a device on their network, or wanting to hit it if it is there, so that's not enabled by default. It only runs if you set an env variable (this is encapsulated in t/02-realfetch.t in the dist).

However, I have another test that uses HTTP::Daemon (from LWP) to serve a (not actually HTML) file locally to test the fetching routine. See t/02-fetch.t (on CPAN). And in rare cases, this is failing for some CPAN testers. My guess is that it's because they're running in an environment that can't actually talk to itself via 127.0.0.1. Which is unusual perhaps, but still a perfectly valid system setup.

There were several failures with the 1.00 release. A main purpose of the 1.01 was to add some extra instrumentation in the test to see where it fell down. There was only one failure with 1.01. From the "Timed out" message from the child side and the lack of returns of real data, it sounds like it just can't talk to itself.

OTOH, it sounds like the ealier test (when it's supposed to get a 404) did succeed, which means it could talk to itself for a moment. And it should have failed earlier and skipped the tests if it couldn't bind() to 127.0.0.1 initially. So I'm not actually certain what's happening here. It's possible the alarm() killed it before it got a chance to make/answer the second request, but it would have to be a very loaded system for it to take 3 seconds to make two requests. Or maybe it just sent the signal way early. There's probably a reasonable chance I just did something stupid, so if anybody can point it out to me, I'd be grateful.

Now, I could avoid all the noise easily enough just by adding a conditional like on the realfetch test, so this test isn't run by default. I'd rather not do that though; any test not run by default won't be run very often, even by me as the developer. And this is the only test that the requests are actually running right all the way down to the network, and storing the results correctly. And going the other way (env var to disable this test) is pointless, since nobody is going to set that either...

So, the failure above may or may not actually be a case of "can't see myself at 127.0.0.1" If it's not, I could use some help with exactly what it is. But either way, the "can't see" case is sure to exist, and if it's due to firewalling or similar network configuration rather than something like "address not assigned", it may not be caught by failing HTTP::Daemon->new.

However it breaks down, I'm seeking advice on how I can robustly keep running this sort of test if at all possible, but cleanly skip things if something in the system environment makes it impossible. I know that it's impossible to really ensure it both ways, but surely it can be done better than what I have now.

Replies are listed 'Best First'.
Re: Localhost network interaction in tests
by Corion (Patriarch) on Jan 01, 2012 at 17:40 UTC

    I wrote a HTTP server module for testing HTTP clients. My advice is to just avoid the HTTP transfer and use file:// URIs to fetch a canned copy of a known good page from disk instead.

    If you really, really want to embark on doing a full end-to-end test with a HTTP server that serves a canned copy of the modem pages, have a look at Test::HTTP::LocalServer, which provides a set of canned responses and can easily be extended to serve canned files as well. But still, it is far easier to substitute the base URI for accessing the modem to be a file:// URI instead.

      [...] Test::HTTP::LocalServer [...] substitute the base URI for accessing the modem to be a file:// URI instead.

      T:HTTP::LocalServer seems to pretty much mirror what I'm doing. With a bit less hardcoded single-purpose, anyway.

      I don't like the idea of going file:/// though, since that means I'd have to add an extra magic hook in the code to say "here, use this fake URL instead", which would defeat a lot of the purpose of running through the code path in the first place. It would still run the LWP bits, but the URL construction would be bypassed. Also doesn't give it the chance to double check that it properly errors on 404 as the test currently does either.

Re: Localhost network interaction in tests
by rkrieger (Friar) on Jan 02, 2012 at 14:12 UTC

    Your test assumes a common, though not guaranteed, environment and would fail in several modes. Things that come to mind are:

    • restricted localhost traffic
    • non-root user trying to bind port 80
    • IPv6-only systems (?)

    I'd say none of these really are your code's fault. Asking the testing user for environment details that should work for them seems pretty fair to me, even if you have (understandable) reservations. This reminds me of the Astro::SpaceTrack module I use for work (e.g. its query test). I'm sure there are other ways of doing this, though.

    Alternatively, stick with localhost as a default and simply skip over the tests if the basic HTTP traffic fails. If you want to filter out noise in failure reports, have things fail with a reference error/warning text that's easy for you to filter out as noise.