You're right, I just glanced at av.c and it isn't a power of 2. It is 20%. Perhaps I hadn't read it carefully enough originally? (In which case someone might look at my unshift speedup since that does go by powers of 2.) Or perhaps I am mixing it up with the fact that many things I have looked at (hash resizing, some malloc arrangements) like working with powers of 2?
However 20% per time is still a geometric arrangement and therefore is a constant amortized cost per new element added. It is just a higher amortized cost. Let's calculate it in theory. Suppose that we have been adding one element at a time with push, and we have just had to recopy the array. Suppose that our size is now N. How many recopyings have we needed (ignoring the rounding because we only move an integer number)? Well N elements got moved this time. When we moved we wind up with 6/5'ths as much space, so 5/6'ths of our elements got moved on the previous time we had to move. And 5/6'ths of 5/6'ths of them got moved the round before that. etc. This is just a geometric series with r=5/6. The well-known trick for calculating those is to multiply and divide by 1-r:
N + r*N + r*r*N + r*r*r*N
= (N + r*N + r*r*N + r*r*r*N)*(1-r)/(1-r)
= (N + (r*N-r*N) + (r*r*N-r*r*N) + (r*r*r*N-r*r*r*N) + ...)/(1-r)
= N/(1-r)
In our case r is 5/6, so 1-r is 1/6 and 1/(1-r) is 6.
Therefore, worst case, the amount of recopying that we would have to do averages out to 6 times per element in the array. Best case, just before we have to recopy is 5 times. So growing an array incrementally we have to recopy each element 5-6 times.
Now how do theory and practice work out? Well consider the following silly program to calculate the exact number of resizes after building up an array to any size, taking into account rounding etc:
my $max_size = 0;
my $size = 0;
my $recopies = 0;
my $last_pow = 1;
while (++$size) {
if ($size >= $max_size) {
use integer;
my $old_size = $size - 1;
$max_size = $size + $old_size/5;
$recopies += $old_size;
}
if (not $size%2 and ($size>>1) == $last_pow) {
my $avg = $recopies/$size;
printf("% 8d: % 9d recopies. Avg %1.5f per element\n",
$size, $recopies, $avg);
$last_pow = $last_pow + $last_pow;
}
}
and the output from this is:
2: 1 recopies. Avg 0.50000 per element
4: 6 recopies. Avg 1.50000 per element
8: 28 recopies. Avg 3.50000 per element
16: 81 recopies. Avg 5.06250 per element
32: 195 recopies. Avg 6.09375 per element
64: 390 recopies. Avg 6.09375 per element
128: 783 recopies. Avg 6.11719 per element
256: 1332 recopies. Avg 5.20312 per element
512: 2726 recopies. Avg 5.32422 per element
1024: 5605 recopies. Avg 5.47363 per element
2048: 11566 recopies. Avg 5.64746 per element
4096: 23916 recopies. Avg 5.83887 per element
8192: 41278 recopies. Avg 5.03882 per element
16384: 85513 recopies. Avg 5.21930 per element
32768: 177228 recopies. Avg 5.40857 per element
65536: 367397 recopies. Avg 5.60603 per element
131072: 761722 recopies. Avg 5.81148 per element
262144: 1316176 recopies. Avg 5.02081 per element
524288: 2729094 recopies. Avg 5.20533 per element
1048576: 5658908 recopies. Avg 5.39676 per element
2097152: 11734158 recopies. Avg 5.59528 per element
4194304: 24331784 recopies. Avg 5.80115 per element
8388608: 42045207 recopies. Avg 5.01218 per element
So you see, as long as Perl can continue allocating more and more space as it wants, the amount of recopying work needed scales linearly with the size of the array. |