|Problems? Is your data what you think it is?|
When the Best Solution Isn'tby sauoq (Abbot)
|on Sep 23, 2002 at 01:00 UTC||Need Help??|
I started to write this as a reply to Random array sorting, a question asked by kidd. I realized I was addressing a larger issue and that it didn't really belong there even though the example I use is related. I believe this is my first meditation.
A couple years ago, when I was at a different company than I am now, one of my co-workers asked pretty much the same question kidd did. He needed to randomize lines in a file, wanted to do it with perl, and was looking for the best (most efficient) way. My reply was the same as merlyn's to kidd, "it's a FAQ." The lead toolsmith, however, suggested another answer so simple, elegant, and efficient that I was simply blown away. His not only benchmarked much better but would have won if we were playing golf too.
To put things in perspective, our toolsmith was no slouch. If you work on a unix platform, the chances are that you have run code written by him. He has spent time maintaining some significant GNU projects and has contributed substantially to many others.
The solution that he proposed for randomizing an array is quite beautiful. You may want to use it. Here's a word of caution: DON'T. It is, unfortunately, incorrect.
Even when I first saw this, something about it felt wrong. It was too simple. I pointed out that the behavior of qsort(), the C function which acts as the basis for Perl's sort, is undefined when the comparison function doesn't produce an ordering. That, however, didn't seem to be much of an argument as A) it appeared to work and B) we wanted the ordering to be undefined. As long as it didn't add or remove elements, it should work, right? I gave up. He was right. It was beautiful. I would never be the developer he was. As far as I know, his solution went into production code and may well remain there today.
Sometime after I left that company, the question came up again. I looked a little closer at that most elegant of solutions. I never really tested it for correctness. Here's what I found:
Notice how, of the six possible orderings, two of them came up twice as many times as any of the four others. Not very random, is it? The problem lies in the way a quicksort works. Determining the details is left as an exercise.
This story has several morals. Among them are "no one is infallible," "it isn't a solution until it has been thoroughly tested," and "elegance is worthless without correctness." Another lesson I learned was that, since Perl hides so much power under the surface, sometimes a simple solution isn't simple at all.
I'm wondering if any other monks have stories of great solutions that turned out to be subtly wrong. I offer this thread up as a place to collect them so that they may serve as reminders to ourselves and warnings to others.
Edit: Added readmore tags.
-sauoq "My two cents aren't worth a dime.";