like when tye tested out his new ticker, it worked great for about a minute, but he, as a god, could've kept the rest of the pm out of the test
Interesting choice of examples.
by simply debugging stuff on their private nodes
We do that all the time! It doesn't solve the whole problem.
The fact is that the new chatterbox xml ticker was developed for several weeks as a separate node and was tested in a number of different ways. Eventually, you have to move beyond unit testing, integration testing, and system testing and go to alpha/beta testing. As this thread highlights, we don't have a set of alpha- nor beta-test users. But the biggest problem with the chatterbox xml ticker was that it used too much CPU and this probably would not have been noticed if it were only used by a small set of 'lab rats'.
So in this particular case, your suggestion wouldn't have done any good -- the problem would still not have been found until it was put onto the live site. Sure, you can develop performance testing scenarios. And after that problem I did write and use some performance unit tests for that particular feature. But I have no plans for developing a performance system test with similuated visitors using a separate group of three machines that I expect pair.com to provide for free. Sorry, "we" don't have the budget for that.
I've done quite a bit of work to extended the options for testing PM changes. I've set up a test environment for system testing for changes to the *.pm files that PM uses. We are currently testing several new features (for example, new perl monks user search -- this link will break and then probably go away when the testing is finished and we roll it into production).
ar0n is working on setting up a backup database. Sounds easy, and it should be, but I won't go into the many problems we've already run into trying to get this simple tool working. If we got paid to do this full time, we'd probably have worked this all out in a day. Be we aren't and we don't so we didn't.
I have plans for changing how "bad" HTML is stripped that will address several current problems (it won't be stripped any more, it will be escaped and appear as text). For this I'll need to have it take effect for only some users so that I can properly test it. I'll be making myself a 'lab rat' by adding the 'lab rat' idea for just this specific feature. I'm doing this because it is worth the effort in this particular case.
Making the 'lab rat' idea (creating a group of beta-test users) work in a more general way is quite a bit of work and, in my opinion, isn't nearly worth the effort at this point, dws. It may be implemented slowly, in pieces when motivated by certain changes.
Yes, we need more testing options. We know this, and that is why we've been working on creating them. We need a good super search that doesn't make the site totally unstable. We've been working on that as well. We've even made some progress on several fronts there. We've tried several things that didn't pan out. We've started working on several other things that aren't finished yet.
I feel bad because I recently made a very small change that had no effect on most of the site (to improve the appearance of some parts while improving support for some less-common browsers). It also broke the appearance of several parts of the site that I went about fixing. I didn't expect this to take too long or I wouldn't have made the changes in the way that I did. I was quite surprised when this small change tickled a rather unfortunate bug in NetScape 4.something which caused severe disruption in a few areas of PM (if you were using NS4.x). I ended up spending quite a long time resolving issues related to this and often wondered if it wouldn't be less work to back out what I'd done so far than continue forward.
If I'd had any idea that the change would have lead to such problems, or would have realized the magnatude my mistake sooner in the process, then I would have done things much differently. I try to do the level of testing, isolation, announcement, etc. that seems appropriate for the type and size of change being made. I try to be conservative. But there are a lot of things that are still very difficult to test. We're working to reduce them, but not to the exclusion of all other activities.
As I recall, the last patch you submitted, crazyinsomniac, contained a syntax error and this was very soon after you'd scolded some of us about how trivial it is to test for those. In that case, I actually did that work for you and tested your patch before I applied it.
So it is easy to look from the outside and wonder why we don't do a better job. We will continue to make mistakes. A lot of the work we are doing now is the kind of background work that is needed so that we can provide real improvements with an appropriate amount of risk, so you probably haven't notice much of the "improvements". I'm sorry, I wish I had more time to create more summaries of what is going on behind the scenes. I fight the fires that look the most important, look the easiest to put out, look like the most fun, etc. And public announcements is just another fire (along with my real job, my family, etc.).
Finally, thanks for the ideas and suggestions. I read everything in Perl Monks Discussion. I'm sorry but, if an idea comes up that I can't use (or that we've already tried), then I often won't take the time to explain why I can't use it. For one thing, it often isn't easy (for me, anyway) to explain. But I don't want people to think I'm ignoring the input. I appreciate it.
- tye (but my friends call me "Tye")