The stupid question is the question not asked PerlMonks

### RE: Re: Randomize an array

 on Sep 08, 2000 at 05:01 UTC ( #31525=note: print w/replies, xml ) Need Help??

in reply to Re: Randomize an array

That doesn't make much sense Tye. Qsort is, afterall, an implementation of Quicksort. Quicksort is a pretty smart algorithm for sorting an array by loading the whole thing into memory. (In fact, it has also been proven to be the most efficient algorithm for sorting a large array. You can't do better then bigO(n log n))

We break from this post to give a brief description of quicksort for those who don't know.
Quicksort is a recursive algorithm that picks a pivot element out of the array and then compares all the other elements to it. This results in two arrays and the pivot, where one array is all items "greater then" the pivot and the other is all items "less then" the pivot. Quicksort is then used to sort these two arrays. Since its done recursivly, it ends up with an array of pivot points in order. The code might look something like this:

```sub Qsort  # WRONG! See update below!
{
my \$pivot = shift; # picking the pivot is usually more
# involved. Afterall, you don't want
# an extreme, you want both arrays to
# be roughly the same size.
return \$pivot unless @_;
my( @lt, @gt );
push( \$_ gt \$pivot ? @gt : @lt ) for @_;
return( Qsort( @lt ), \$pivot, Qsort( @gt ) );
}
(Disclaimer: I didn't test that and I havn't looked any of this up, so I could be completly wrong and that code might not work)

So you see, the amount of time @a = sort {(-1,1)[rand 2]} @a will take is finite and well defined, regardless of the function used to determine which side of the pivot to place each element. So the only reason it would dumb core would be that it ran out of memory, but it should never become hung (unless, as I said, it ran out of memory... those sneaky recursive algorithms). The only thing that varies in a quicksort is how well the pivot is chosen. If you pick an extreme (where everything ends up on one side of the pivot) then Qsort will take much longer then if you pick the middle where everything is balanced. So the point of all this is that I doubt the implementation has gotten better, but rather the hardware (128 MB ram sure beats 2 or 4 MB).

Update:
So I did a little research on Quicksort and realized that it usually operates on the array in place, which mine clearly doesn't. And that it does even less work then mine did. How about:

```sub Qsort
{
my( \$arrayRef, \$lowIndex, \$highIndex ) = @_;
return undef if \$highIndex le \$lowIndex;
my \$pivotIndex = partition( \$arrayRef, \$lowIndex, \$highIndex );
Qsort( \$arrayRef, \$lowIndex, \$pivotIndex - 1 );
Qsort( \$arrayRef, \$pivotIndex + 1, \$highIndex );
}

# And then, of course, I need to provide &partition
sub partition
{
my( \$arrayRef, \$lowIndex, \$highIndex ) = @_;
my \$pivot = \$highIndex + \$lowIndex >> 1;
my \$pivotElement = \$\$arrayRef[\$pivot];

while( \$highIndex > \$lowIndex )
{
++\$lowIndex while \$\$arrayRef[\$lowIndex] <= \$pivotElement and \$
+_[2] > \$lowIndex;
--\$highIndex while \$pivotElement <= \$\$arrayRef[\$highIndex] and
+ \$highIndex > \$_[1];
Swap( \$arrayRef, \$lowIndex, \$highIndex ) if \$highIndex > \$lowI
+ndex;
}

if( \$highIndex > \$pivot )
{
Swap( \$arrayRef, \$pivotIndex, \$highIndex );
\$pivot = \$highIndex;
}
elsif( \$pivot > \$lowIndex )
{
Swap( \$arrayRef, \$pivotIndex, \$lowIndex );
\$pivot = \$lowIndex;
}

return \$pivot;
}

sub Swap { \${\$_[0]}->[\$_[1],\$_[2]] = \${\$_[0]}->[\$_[1],\$_[2]] }
# Warning, untested code...
For more about sorting, check out these great documents I found: Here and here.
Not to forget Mastering Algorithms with Perl from O'Reilly, which has a whole chapter devoted to sorting algorithms.

BTW: After at least an hour of working on this post I have realized that a true quicksort would, in fact, go crazy given { (-1,1)[rand 2] } and I am now somewhat mystified as to why it converges at all. I think its time to go eat something and I will return to this later.

Update:
So I slept on it, and I am now re-convinced that Qsort will be finite regardless of the random nature of its partition. So I go back to the original point of this post, Tye, that Qsort should only fail if it runs out of memory for all those recursive calls, and that it should end in a finite amount of time not to exceed some constant times N squared. (N being the number of elements to be sorted.)

My apologies to anyone that I have confused with this post. Maybe it will inspire you to learn more about sort.

Replies are listed 'Best First'.
RE (tilly) 3 (tis true): Randomize an array
by tilly (Archbishop) on Sep 08, 2000 at 06:01 UTC
OK, I just had to go searching for some old documentation and you can see it laid out there.

Clearly what is called qsort really need not be. For many possible reasons.

Now more details. You are right that the average time for qsort to work is n*log(n). However qsort actually has a worst case of n^2. In fact the worst case with a naive qsort is hit on an already sorted list. Perl currently uses a qsort with pivots chosen carefully to move the worst case to something unlikely.

You are also right that no sorting algorithm can possibly beat having an average case of O(n*log(n)). Why not? For the simple reason that there are n! possible permutations you have to deal with. In m comparisons you can only distinguish at most 2^m of them. So the number of comparisons you need will be at least log_2(n!). Well up to a constant factor that is log(n!) which is

```log(n!) = log(1*2*...*n)
= log(1) + log(2) + ... + log(n)
```
which is approximately the integral from 1 to n of log(x). Which in turn is n*log(n)-n+1 plus error terms from the approximation. (After a full derivation you get Stirling's Approximation.)

Right now all that concerns us is that n*log(n) term out front. You cannot get rid of that.

Now that said there are many sorting algorithms out there. qsort is simple and has excellent average performance, but there are others that have guaranteed performance and are order n on sorted datasets. Incidentally there is right now an interesting discussion of this on p5p, including a brand new discovery of a memory leak...

Well..... There is the Radix Sort, which is O(N)... course it also requires more information about what is being sorted, and places constraints on that data set, and well its just not as general so its not considered on par with Qsort or MergeSort.
Can we really call that O(N)? The constant depends upon the number of passes. As you increase the number of items, eventually you have to have long strings which requires lots of passes, indeed with a fixed number of symbols in your alphabet the number of things you can represent rises exponentially in the length you allow...which means it is truly n*log(n) again. :-)

Indeed my explanation about Stirling's formula is still relevant, re-read it and you can see that the fundamental issue is that a set of decisions with fixed branching can only account for an exponential number of possibilities. So the number of branches needed to sort n things grows like log(n!) which is order n*log(n). However the big win is that with radix sort you can get a far better branch factor than 2. (At least initially.)

But unfortunately the radix sort cannot be made to work with arbitrary sort functions since it does not (at least not directly) work off of binary comparisons...

Create A New User
Node Status?
node history
Node Type: note [id://31525]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2019-05-27 09:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Do you enjoy 3D movies?

Results (155 votes). Check out past polls.

Notices?