http://www.perlmonks.org?node_id=42912

There is a brand new patch that speeds up repeated calls to unshift by a very large margin. This came out of the PerlMonks conversation at japhy looks at av.c (not av.h), and my original interest in this came out of a chatter conversation some time ago.

What I find particularly amusing is that this patch is based on making unshift bigger, slower, and wasteful of memory. Optimizations are certainly not always obvious! The way it works is that when you call unshift and it needs to extend your array, it adds to the request the size of the current array. The reason is that extending the array at the front involves moving the whole thing, and this is expensive so you want to do it as seldom as possible. In big-O notation with this change

push @foo, 1 for 1..$n;
takes O(n) instead of O(n*n). :-)

UPDATE
The patch has been accepted.

It is in the current development snapshot.

:-)

Replies are listed 'Best First'.
Re: Speeding up unshift
by japhy (Canon) on Nov 22, 2000 at 19:12 UTC
    I am very frustrated in that your patch looks simpler than my attempt, and yet I'm not sure I understand, probably because the AvMAX() and AvFILLp() don't seem to be documented well enough for me to figure out what they're doing.
    if (num) { i = AvFILLp(av); /* Create extra elements */ /**/ slide = i > 0 ? i : 0; /**/ num += slide; av_extend(av, i + num); AvFILLp(av) += num; ary = AvARRAY(av); Move(ary, ary + num, i + 1, SV*); do { ary[--num] = &PL_sv_undef; } while (num); /* Make extra elements into a buffer */ /**/ AvMAX(av) -= slide; /**/ AvFILLp(av) -= slide; SvPVX(av) = (char*)(AvARRAY(av) + slide); } }
    Why do you add num to AvFILLp() (which is the original num with slide added to it) only to then SUBTRACT slide? And what is AvMAX()?!

    And I would personally suggest using the heuristic for slide that I had in my code -- namely:
    slide = (i < num && i > 0) ? i : num;
    Anyway. Sigh. I'm not annoyed at you, but at my inability to patch something that shouldn't have been such a well of annoyance.

    japhy -- Perl and Regex Hacker
      Why do you add num to AvFILLp() (which is the original num with slide added to it) only to then SUBTRACT slide?

      Because I am trying for obvious correctness in the patch first. I wanted to make the code be exactly like it would have been under the old (ie tested) logic if more elements had been asked to be added to the beginnning of the array. So the patch was, "Increase the request, drop the excess elements." Conceptually simple, obviously correct. That said AvFILLp(av) probably shouldn't be adjusted twice in a row. OTOH it is possible that smart compilers will figure that out, and compared to the cost of the copy...

      And what is AvMAX()?!

      AvMAX() is the maximum number of elements the array can have before it needs to be reallocated. Every time you adjust the location of the start of the array you need to adjust it. Personally I think that this is an unfortunate and confusing design decision.

      And I would personally suggest using the heuristic for slide that I had in my code...

      I considered it, then thought about it. If you try to build up a very large array with one huge unshift, your heuristic forces that array to take a lot more space than it needs. The patch will optimize many small calls to unshift, not a few big ones. If there are many then deciding to add a lot of extra space the second time rather than the first is not a big deal. Think of it as just being lazy on deciding that you need a lot of buffering. :-)

      BTW I am still thinking about it. Even though in terms of what finally happens adjusting AvFILL twice is clearly stupid, in terms of conceptual units for someone looking at the code, it simplifies what is going on. I don't think that the change is worth making. (In case you are interested to get the accounting straight I just glanced at what shift did to verify what was going on, then copied in reverse the accounting logic straight from the earlier logic where you had space to work with.)