in reply to Re: Re: Re: Re: Re: Slowness when inserting into pre-extended array
in thread Slowness when inserting into pre-extended array
However 20% per time is still a geometric arrangement and therefore is a constant amortized cost per new element added. It is just a higher amortized cost. Let's calculate it in theory. Suppose that we have been adding one element at a time with push, and we have just had to recopy the array. Suppose that our size is now N. How many recopyings have we needed (ignoring the rounding because we only move an integer number)? Well N elements got moved this time. When we moved we wind up with 6/5'ths as much space, so 5/6'ths of our elements got moved on the previous time we had to move. And 5/6'ths of 5/6'ths of them got moved the round before that. etc. This is just a geometric series with r=5/6. The well-known trick for calculating those is to multiply and divide by 1-r:
In our case r is 5/6, so 1-r is 1/6 and 1/(1-r) is 6.N + r*N + r*r*N + r*r*r*N = (N + r*N + r*r*N + r*r*r*N)*(1-r)/(1-r) = (N + (r*N-r*N) + (r*r*N-r*r*N) + (r*r*r*N-r*r*r*N) + ...)/(1-r) = N/(1-r)
Therefore, worst case, the amount of recopying that we would have to do averages out to 6 times per element in the array. Best case, just before we have to recopy is 5 times. So growing an array incrementally we have to recopy each element 5-6 times.
Now how do theory and practice work out? Well consider the following silly program to calculate the exact number of resizes after building up an array to any size, taking into account rounding etc:
and the output from this is:my $max_size = 0; my $size = 0; my $recopies = 0; my $last_pow = 1; while (++$size) { if ($size >= $max_size) { use integer; my $old_size = $size - 1; $max_size = $size + $old_size/5; $recopies += $old_size; } if (not $size%2 and ($size>>1) == $last_pow) { my $avg = $recopies/$size; printf("% 8d: % 9d recopies. Avg %1.5f per element\n", $size, $recopies, $avg); $last_pow = $last_pow + $last_pow; } }
So you see, as long as Perl can continue allocating more and more space as it wants, the amount of recopying work needed scales linearly with the size of the array.2: 1 recopies. Avg 0.50000 per element 4: 6 recopies. Avg 1.50000 per element 8: 28 recopies. Avg 3.50000 per element 16: 81 recopies. Avg 5.06250 per element 32: 195 recopies. Avg 6.09375 per element 64: 390 recopies. Avg 6.09375 per element 128: 783 recopies. Avg 6.11719 per element 256: 1332 recopies. Avg 5.20312 per element 512: 2726 recopies. Avg 5.32422 per element 1024: 5605 recopies. Avg 5.47363 per element 2048: 11566 recopies. Avg 5.64746 per element 4096: 23916 recopies. Avg 5.83887 per element 8192: 41278 recopies. Avg 5.03882 per element 16384: 85513 recopies. Avg 5.21930 per element 32768: 177228 recopies. Avg 5.40857 per element 65536: 367397 recopies. Avg 5.60603 per element 131072: 761722 recopies. Avg 5.81148 per element 262144: 1316176 recopies. Avg 5.02081 per element 524288: 2729094 recopies. Avg 5.20533 per element 1048576: 5658908 recopies. Avg 5.39676 per element 2097152: 11734158 recopies. Avg 5.59528 per element 4194304: 24331784 recopies. Avg 5.80115 per element 8388608: 42045207 recopies. Avg 5.01218 per element
|
---|