http://www.perlmonks.org?node_id=275931


in reply to Re: Re: Slowness when inserting into pre-extended array
in thread Slowness when inserting into pre-extended array

This doesn't seem to have anything at all to do with hashes, as the only difference between the slow and fast versions of the program is that the slow version pushes onto an array while the fast version doesn't. Given perl's behaviour with array extension, I'd pretty much guarantee that the problem is the array push, with a lesser possibility being a process memory freelist fragmentation from all the hash allocations, but I'd doubt that one.
  • Comment on Re: Re: Re: Slowness when inserting into pre-extended array

Replies are listed 'Best First'.
Re: Re: Re: Re: Slowness when inserting into pre-extended array
by tilly (Archbishop) on Jul 19, 2003 at 20:50 UTC
    Unless Perl's implementation of push has changed since I looked at it last, I strongly doubt that the implementation of push would cause performance problems. The power-of-two allocation mechanism that it uses means that, until you run out of memory, the amortized reallocation cost due to running out of space is constant per element added.

    Furthermore when I wrote a test script (Perl 5.8.0 running on Linux) that did what he did except that I made each line the same, I saw absolutely no performance degradation until my machine began paging data to disk. Even then it was a barely measurable slowdown until I got bored with watching it run. This strongly suggests that we are not looking at a poor internal Perl algorithm issue. (My past experience tells me that Linux pages very efficiently until it runs out of RAM and falls over. The Windows NT line - XP included - start showing dramatic slowdowns well before you would think they should be in trouble, but it is very hard to get them to fall over.)

      Perl doesn't use a power-of-two size extension mechanism. When perl increases the size of an array's cache, the formula is generally:
      new_top_index + (allocated_elements / 5)
      For a push, top index is just the allocated size plus one if we've triggered an extension, so when pushes blow past the end of an array they make it 20% bigger. There's a lot more memory copying going on in that case than with a power-of-two extension system. (For folks following along at home, this formula isn't always used, but it is for the case in question)

      Given that the system in question is a Windows box, it wouldn't at all surprise me to find that this is the problem, with the big speed hits coming as the array is resized and ever-larger chunks of memory have to be allocated from the system and copied into. The individual pushes are probably running at full speed, but with ever-increasing performance hits for reallocation you'd not necessarily notice that.

        You're right, I just glanced at av.c and it isn't a power of 2. It is 20%. Perhaps I hadn't read it carefully enough originally? (In which case someone might look at my unshift speedup since that does go by powers of 2.) Or perhaps I am mixing it up with the fact that many things I have looked at (hash resizing, some malloc arrangements) like working with powers of 2?

        However 20% per time is still a geometric arrangement and therefore is a constant amortized cost per new element added. It is just a higher amortized cost. Let's calculate it in theory. Suppose that we have been adding one element at a time with push, and we have just had to recopy the array. Suppose that our size is now N. How many recopyings have we needed (ignoring the rounding because we only move an integer number)? Well N elements got moved this time. When we moved we wind up with 6/5'ths as much space, so 5/6'ths of our elements got moved on the previous time we had to move. And 5/6'ths of 5/6'ths of them got moved the round before that. etc. This is just a geometric series with r=5/6. The well-known trick for calculating those is to multiply and divide by 1-r:

        N + r*N + r*r*N + r*r*r*N = (N + r*N + r*r*N + r*r*r*N)*(1-r)/(1-r) = (N + (r*N-r*N) + (r*r*N-r*r*N) + (r*r*r*N-r*r*r*N) + ...)/(1-r) = N/(1-r)
        In our case r is 5/6, so 1-r is 1/6 and 1/(1-r) is 6.

        Therefore, worst case, the amount of recopying that we would have to do averages out to 6 times per element in the array. Best case, just before we have to recopy is 5 times. So growing an array incrementally we have to recopy each element 5-6 times.

        Now how do theory and practice work out? Well consider the following silly program to calculate the exact number of resizes after building up an array to any size, taking into account rounding etc:

        my $max_size = 0; my $size = 0; my $recopies = 0; my $last_pow = 1; while (++$size) { if ($size >= $max_size) { use integer; my $old_size = $size - 1; $max_size = $size + $old_size/5; $recopies += $old_size; } if (not $size%2 and ($size>>1) == $last_pow) { my $avg = $recopies/$size; printf("% 8d: % 9d recopies. Avg %1.5f per element\n", $size, $recopies, $avg); $last_pow = $last_pow + $last_pow; } }
        and the output from this is:
        2: 1 recopies. Avg 0.50000 per element 4: 6 recopies. Avg 1.50000 per element 8: 28 recopies. Avg 3.50000 per element 16: 81 recopies. Avg 5.06250 per element 32: 195 recopies. Avg 6.09375 per element 64: 390 recopies. Avg 6.09375 per element 128: 783 recopies. Avg 6.11719 per element 256: 1332 recopies. Avg 5.20312 per element 512: 2726 recopies. Avg 5.32422 per element 1024: 5605 recopies. Avg 5.47363 per element 2048: 11566 recopies. Avg 5.64746 per element 4096: 23916 recopies. Avg 5.83887 per element 8192: 41278 recopies. Avg 5.03882 per element 16384: 85513 recopies. Avg 5.21930 per element 32768: 177228 recopies. Avg 5.40857 per element 65536: 367397 recopies. Avg 5.60603 per element 131072: 761722 recopies. Avg 5.81148 per element 262144: 1316176 recopies. Avg 5.02081 per element 524288: 2729094 recopies. Avg 5.20533 per element 1048576: 5658908 recopies. Avg 5.39676 per element 2097152: 11734158 recopies. Avg 5.59528 per element 4194304: 24331784 recopies. Avg 5.80115 per element 8388608: 42045207 recopies. Avg 5.01218 per element
        So you see, as long as Perl can continue allocating more and more space as it wants, the amount of recopying work needed scales linearly with the size of the array.

        That's the second time in this thread that the suggestion has been made that "it's a Windows box" has been offered as an explaination for this problem. It may well be true, but as a Windows user, I'd like to understand the basis for the conclusion?

        Assuming perl is using the same algorithm to request memory from the OS in whatever chunks perl requests it, is linux/unix/VMS/* that much more efficient at making this available to perl?

        Are there any stats available to demonstrate this?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller