|Perl: the Markov chain saw|
Re: Randomize lines with limited memory (Roll your own...)by BrowserUk (Pope)
|on Nov 02, 2003 at 02:30 UTC||Need Help??|
...or use mine:)
This is one of those cases where rolling your own has benefits. The FAQ, Cookbook and the List::Util version of the Fischer-Yates shuffle all use the copy semantics. This means that you need over double the space required to store the data, in order to shuffle it.
My version does an in-place shuffle, the benefits of which really show up when you start shuffling huge arrays. The following results are a comparision between my pure-perl, inplace shuffle and the List::Util XS version.
The results show that my in-place version consumes just 8k extra ram to perform the shuffle, and takes about 15% less time to do it than the XS version. The XS version only takes around 15 seconds to actually perform the shuffle, but the copy semantics mean it loses this performance advantage by the need to allocate double the space, ending up considerably slower.
It wouldn't be that hard (if you are an an accomplished XS programmer) to re-cast the List::Util version to detect that it was being given an array reference and was being called in a void context and switch to an in-place algorithm. Some crude tests seem to show that this would not only halve the memory usage, but as a result, would cut the overall shuffle time to less than a third.
The benchmark (You'll need to use an external tool to measure the memory usage).
Examine what is said, not who speaks."Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail