Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Speeding up unshift

by tilly (Archbishop)
on Nov 22, 2000 at 17:16 UTC ( #42912=perlmeditation: print w/replies, xml ) Need Help??

There is a brand new patch that speeds up repeated calls to unshift by a very large margin. This came out of the PerlMonks conversation at japhy looks at av.c (not av.h), and my original interest in this came out of a chatter conversation some time ago.

What I find particularly amusing is that this patch is based on making unshift bigger, slower, and wasteful of memory. Optimizations are certainly not always obvious! The way it works is that when you call unshift and it needs to extend your array, it adds to the request the size of the current array. The reason is that extending the array at the front involves moving the whole thing, and this is expensive so you want to do it as seldom as possible. In big-O notation with this change

push @foo, 1 for 1..$n;
takes O(n) instead of O(n*n). :-)

The patch has been accepted.

It is in the current development snapshot.


Replies are listed 'Best First'.
Re: Speeding up unshift
by japhy (Canon) on Nov 22, 2000 at 19:12 UTC
    I am very frustrated in that your patch looks simpler than my attempt, and yet I'm not sure I understand, probably because the AvMAX() and AvFILLp() don't seem to be documented well enough for me to figure out what they're doing.
    if (num) { i = AvFILLp(av); /* Create extra elements */ /**/ slide = i > 0 ? i : 0; /**/ num += slide; av_extend(av, i + num); AvFILLp(av) += num; ary = AvARRAY(av); Move(ary, ary + num, i + 1, SV*); do { ary[--num] = &PL_sv_undef; } while (num); /* Make extra elements into a buffer */ /**/ AvMAX(av) -= slide; /**/ AvFILLp(av) -= slide; SvPVX(av) = (char*)(AvARRAY(av) + slide); } }
    Why do you add num to AvFILLp() (which is the original num with slide added to it) only to then SUBTRACT slide? And what is AvMAX()?!

    And I would personally suggest using the heuristic for slide that I had in my code -- namely:
    slide = (i < num && i > 0) ? i : num;
    Anyway. Sigh. I'm not annoyed at you, but at my inability to patch something that shouldn't have been such a well of annoyance.

    japhy -- Perl and Regex Hacker
      Why do you add num to AvFILLp() (which is the original num with slide added to it) only to then SUBTRACT slide?

      Because I am trying for obvious correctness in the patch first. I wanted to make the code be exactly like it would have been under the old (ie tested) logic if more elements had been asked to be added to the beginnning of the array. So the patch was, "Increase the request, drop the excess elements." Conceptually simple, obviously correct. That said AvFILLp(av) probably shouldn't be adjusted twice in a row. OTOH it is possible that smart compilers will figure that out, and compared to the cost of the copy...

      And what is AvMAX()?!

      AvMAX() is the maximum number of elements the array can have before it needs to be reallocated. Every time you adjust the location of the start of the array you need to adjust it. Personally I think that this is an unfortunate and confusing design decision.

      And I would personally suggest using the heuristic for slide that I had in my code...

      I considered it, then thought about it. If you try to build up a very large array with one huge unshift, your heuristic forces that array to take a lot more space than it needs. The patch will optimize many small calls to unshift, not a few big ones. If there are many then deciding to add a lot of extra space the second time rather than the first is not a big deal. Think of it as just being lazy on deciding that you need a lot of buffering. :-)

      BTW I am still thinking about it. Even though in terms of what finally happens adjusting AvFILL twice is clearly stupid, in terms of conceptual units for someone looking at the code, it simplifies what is going on. I don't think that the change is worth making. (In case you are interested to get the accounting straight I just glanced at what shift did to verify what was going on, then copied in reverse the accounting logic straight from the earlier logic where you had space to work with.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://42912]
Approved by root
[shmem]: Lady_Aleena: geany supports ctags.
[Discipulus]: a good compromise can be my ($need, $opts_ref) = @_ a scalar and an hash reference
[Discipulus]: see you monks!
[Lady_Aleena]: shmem, let me get this sub rewritten, then I will look into how to use ctags in geany. Deal? 8)
[shmem]: Discipulus: yeah, that might eventually prepare the path for OO ;-)
[Lady_Aleena]: See you, Discipulus.
[shmem]: Lady_Aleena: that's up to you. I only wanted to show you a path that might be more comforting than command line grep ;-)
[Lady_Aleena]: shmem, I don't think any of my modules could be converted to OO. They are too procedural.
[Lady_Aleena]: Now I will see if my perl-fu is as bad as I think it is.

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2017-04-27 12:31 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (506 votes). Check out past polls.