Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Be aware of splice

by Abigail-II (Bishop)
on May 03, 2004 at 09:34 UTC ( #349969=perlmeditation: print w/ replies, xml ) Need Help??

I'm writing this because of "Is this a fair shuffle?". In the thread, saintmike asked whether a certain algorithm using splice was a fair shuffle. BrowserUk replied that algorithm was fair (which it is - assuming your random generator is fair), and that it was an implementation of the Fisher-Yates shuffle, which it isn't. Fisher-Yates is an algorithm that works in linear time - the presented algorithm takes quadratic time (expected). For reference, the presented algorithm:
my @a = (1..10); my @b; push @b, splice @a, rand @a, 1 while @a;
The discussion resulted in the question why the presented algorithm is quadratic, and BrowserUK writes:
I'm still a little bemused by why swapping pointers on the linked list, rather than swapping the contents of the elements the linked list points at, becomes quadratic, but the (newer) numbers demonstrate your point. I will have to sit down with a pen and paper and the source code of splice to understand why the costs grow that way.
BrowserUK focusses the wrong thing. The copying of the target element isn't the issue. It's the splicing that's the bottleneck. There's no "linked list" (BrowserUK is probably referring to the fact that the internal array Perl uses stores pointers to SV's. Whether they are pointers, or whether pointers are swapped or the content of the SV's isn't important either - it's the splicing).

Consider the following array:

+------+------+------+-------+------+------+------+-------+------+------+------+
|      |      |      |       |      |      |      |       |      |      |      |
|   0  |   1  |   2  | . . . |  k-1 |   k  |  k+1 | . . . |  n-2 |  n-1 |   n  |
|      |      |      |       |      |      |      |       |      |      |      |
+------+------+------+-------+------+------+------+-------+------+------+------+
To be able to do quick indexing, Perl stores the elements, just as done is in C, in consecutive memory locations. It doesn't store the values directly, just pointers to SVs, but that's not the point. Here's the array again, but below are the memory locations of the elements (or pointers to the elements), assuming each element takes 4 bytes.
 
   +--- Array
   |
   v
+------+------+------+-------+------+------+------+-------+------+------+------+
|      |      |      |       |      |      |      |       |      |      |      |
|   0  |   1  |   2  | . . . |  k-1 |   k  |  k+1 | . . . |  n-2 |  n-1 |   n  |
|      |      |      |       |      |      |      |       |      |      |      |
+------+------+------+-------+------+------+------+-------+------+------+------+
 M      M+4    M+8            M+4k-4 M+4k   M+4k+4         M+4n-8 M+4n-4 M+4n
We've also introduced the Array pointer, which is the part of the AV structure that points to the beginning of the array (element with index 0).

Now assume that we want to splice off element k. Our resulting array will look like either of:

   +--- Array
   |
   v
+------+------+------+-------+------+------+-------+------+------+------+
|      |      |      |       |      |      |       |      |      |      |
|   0  |   1  |   2  | . . . |  k-1 |  k+1 | . . . |  n-2 |  n-1 |   n  |
|      |      |      |       |      |      |       |      |      |      |
+------+------+------+-------+------+------+-------+------+------+------+
 M      M+4    M+8            M+4k-4 M+4k   M+4k+4         M+4n-8 M+4n-4 M+4n   
or
           +--- Array
           |
           v
       +------+------+------+-------+------+------+-------+------+------+------+
       |      |      |      |       |      |      |       |      |      |      |
       |   0  |   1  |   2  | . . . |  k-1 |  k+1 | . . . |  n-2 |  n-1 |   n  |
       |      |      |      |       |      |      |       |      |      |      |
       +------+------+------+-------+------+------+-------+------+------+------+
 M      M+4    M+8            M+4k-4 M+4k   M+4k+4         M+4n-8 M+4n-4 M+4n   
Now as you see, this requires that either k elements will be moved (4 bytes 'up'), or that n - k - 1 elements will be moved (4 bytes 'down'). Now it's clear to see that in at least n/2 cases, you have to move at least n/4 elements. In fact, if you do the math, you'll see that the expected number of moves due to a single splice is about n/4.

That's why splicing a single element out of an array takes linear time (expected), and that's why the splicing shuffle algorithm takes quadratic time.

Abigail

Comment on Be aware of splice
Download Code
Re: Be aware of splice
by tilly (Archbishop) on May 03, 2004 at 18:23 UTC
    This example illustrates an additional point about algorithmic efficiency.

    BrowserUk had initially thought that the algorithm wasn't quadratic because it didn't look quadratic in runs of 10, 100, and 1000 elements. But it did look quadratic when he tested 10_000 and 100_000.

    The reason is that the quadratic piece all consists of copying pointers in C, which is very fast. You do a lot of it, so it doesn't scale, but for small sample sizes the eventual performance bottleneck is invisible. And then once the performance bottleneck becomes visible, it doesn't take long until it dominates.

    Which is a basic fact about scalability. When you are dealing with small samples, you care much more about the efficiency of your individual operations. Which is relatively easy to benchmark. When you deal with large samples, you care about the scaling pattern. A very cheap operation that is at a spot which scales poorly will eventually be a bottleneck. Understanding the latter kind of problem is fairly hard for most people, and problems of this nature constantly tend to surprise people who didn't factor it in in a simple extrapolation.

    (Important note: scaling bottlenecks don't just happen in code. If you spend most of your life in meetings, well you've just seen what a scaling bottleneck in an organization of humans looks like!)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://349969]
Approved by pelagic
Front-paged by edan
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2014-09-19 14:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (140 votes), past polls