|Problems? Is your data what you think it is?|
RE (tilly) 3 (tis true): Randomize an arrayby tilly (Archbishop)
|on Sep 08, 2000 at 06:01 UTC||Need Help??|
OK, I just had to go searching for some old documentation and you can see it laid out there.
Clearly what is called qsort really need not be. For many possible reasons.
Now more details. You are right that the average time for qsort to work is n*log(n). However qsort actually has a worst case of n^2. In fact the worst case with a naive qsort is hit on an already sorted list. Perl currently uses a qsort with pivots chosen carefully to move the worst case to something unlikely.
You are also right that no sorting algorithm can possibly beat having an average case of O(n*log(n)). Why not? For the simple reason that there are n! possible permutations you have to deal with. In m comparisons you can only distinguish at most 2^m of them. So the number of comparisons you need will be at least log_2(n!). Well up to a constant factor that is log(n!) which is
log(n!) = log(1*2*...*n) = log(1) + log(2) + ... + log(n)which is approximately the integral from 1 to n of log(x). Which in turn is n*log(n)-n+1 plus error terms from the approximation. (After a full derivation you get Stirling's Approximation.)
Right now all that concerns us is that n*log(n) term out front. You cannot get rid of that.
Now that said there are many sorting algorithms out there. qsort is simple and has excellent average performance, but there are others that have guaranteed performance and are order n on sorted datasets. Incidentally there is right now an interesting discussion of this on p5p, including a brand new discovery of a memory leak...