|No such thing as a small change|
Re^5: About List::Util's pure Perl shuffle()by BrowserUk (Pope)
|on Jul 12, 2007 at 16:08 UTC||Need Help??|
More and more interesting. But then, since all the other subs are (@), why not simply the following?
blazar++. Just checking if anyone is still paying attention :)
Since bukNew() doesn't need to copy the input list--because it shuffles, and therefore mutates, a list of indices generated internally, rather than mutating a copy of the input--it seemes silly to pass arrays in by value forcing their duplication. Most uses of shuffle() are applied to pre-existing arrays. That's why we make a copy of them, or create a list of aliases to them in most versions--simply to avoid the shuffle from modifying the external arrays. So, instead of
we can avoid some copying by using
And when we want to shuffle a list, instead of
Inspired by blokhead's version, I went back to basics and tried benchmarking a simple version that took a reference rather thn a list:bukNew() with quite surprising results.
Of course, the same lesson can now be retrofitted to those other routines that don't need to replicate the input list for their operation. In particular, blokhead's version!
And here are the headline results of doing that (as blokhead_ref):
Notice that the length of the strings is not a factor for blokhead(_ref) or bukNew!
The significant thing is that blokhead_ref moves ahead of blokhead very quickly as the number of elements increases. Avoiding copying the array pays dividends very quickly.
However some random tests with various string and list lengths seem to show that it could be considerably slower than buk(), by even as much as 40% or so. Thus... are you sure your benchmark is not flawed?
If you look back over the thread, you'll see that there have been various instances where different people have seen different results from benchmarking apparently the same code.
Some of the differences are explained by whether the data being shuffled consists of all integer data (as in your original benchmark (0..1000 etc.), or string data as used by most people after ikegami noted the difference it makes. When an SV points to an IV, copying that SV is a faster operation than when it points to a PV. With the latter, a second memory allocation and memcpy operation have to be performed to copy the string data pointed at by the PV. If the SV contains integer (and probably float?), and has never been used in a string context, then the number will never have been ascii-ized and the PV will be null. That makes for significantly less work to copy it.
That doesn't explain all the anomolies seen above though I think? Anyway, it's quite possible my benchmark is flawed--it certainly wouldn't be the first time :)--so here is the code. Tell me what you think?
The benchmark code:
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.