Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
or: How I Learned to Stop Worrying and Love Perl's Built-in List Operators

I come from a CS background. I've had algorithmic efficiency beat into me till it was coming out of my ears. After so many classes on data-structures and algorithm analysis I've become wary of built-in, higher-level operators like Perl's push, pop, shift and unshift list manipulation functions. Oh I've used them plenty, but there has always been this nagging little voice in the back of my head saying What if unshift takes longer to run with a larger list? and does it take longer to find the size of a bigger list?.

Questions like these led me on a spiritual quest through the bowels of Perl's source code.... The details of my journey follow.

Now before I dig too deep, I want to introduce the uninitiated to big-O notation. Big-O notation if a method of describing the efficiency of an algorithm as it relates in a very general way to the quantity of data the algorithm is operating on. Urri Guttman and Larry Rossler's article A Fresh Look at Efficient Perl Sorting (in addition to being a great article in its own rite) has a fantastic overview of big-O notation much better than what I have cobbled together here. Some common big-O analyses are as follows (in order from best to worst performance-wise).

  • O(1) - algorithm takes a constant amount of time to run no matter what the dataset size. A good example of this is a hash table lookup.
  • O(log(n)) - algorithm performance is proportional to the logarithm of the dataset size. A binary search is O(log(n)).
  • O(n) - algorithm runtime scales linearly with dataset size. If it took 10 seconds with 200 elements, it will take 20 seconds with 400 elements. Scanning a list for an element with a particular value is O(n).
  • O(n*log(n)) - the best sorting algorithms (quicksort, mergesort) take this long to run.
  • O(n^2) - algorithm performance is proportional to the dataset size squared. Basic sorting algorithms (selection sort, insertion sort, bubble sort) take this long to run.
  • O(2^n) - only the slowest algorithms take this long to run. Traveling salesman problem, etc.

A straightforward list implementation that every introductory CS student learns uses an array and an offset to the last in-use array element to represent a list. This implementation works well for push, pop, determining number of array elements, insertion, fetching, etc. However, performance of shift and unshift with this implementation is horrible: O(n)! This is because there is no room at the beginning of the array and all the elements have to be shifted by one to insert/remove an element from the front of the list.

There are many other ways to implement lists (some better, some worse, others with different performance tradeoffs), but without knowing which implementation Perl used (or at least seeing some notes in perlfunc or some other piece of perl documentation about the performance of these list manipulation functions) I was having trouble sleeping at night.

Thanks to some source-code browsing and the Perl Internals chapter of Advanced Perl Programming I am now happy and content to use all of Perl's built-in list operators without any loss of sleep.

Perl implements lists with an array and first/last element offsets. The array is allocated larger than needed with the offsets originally pointing in the middle of the array so there is room to grow in both directions (unshifts and pushes/inserts) before a re-allocation of the underlying array is necessary. The consequence of this implementation is that all of perl's primitive list operators (insertion, fetching, determining array size, push, pop, shift, unshift, etc.) perform in O(1) time.

There are some worst-case scenarios of push, unshift and insertion where performance is O(n). These occur when the array upon which the list is implemented has to be reallocated because the operation would access the array beyond its preallocated size. This reallocation overhead is a common factor in all list implementations that handle all the primitive operators in O(1) time.

One consequence of perl's list implementation is that queues implemented using perl lists end up "creeping forward" through the preallocated array space leading to reallocations even though the queue itself may never contain many elements. In comparison, a stack implemented with a perl list will only require reallocations as the list grows larger. However, perl is smartly coded because the use of lists as queues was anticipated. Consequently, these queue-type reallocations have a negligible impact on performance. In benchmarked tests, queue access of a list (using repeated push/shift operations) is nearly as fast as stack access to a list (using repeated push/pop operations).

The preallocated array is balanced toward the 0 end of the array (meaning there is more free space at the far end of the list than there is before the list's 0 element). This is done purposely to make pushes more efficient than unshifts (since perl programmers use push much more than they use unshift). As a consequence, a given number of unshifts will result in more reallocations than the same number of pushes. My benchmarking of a front-loaded queue (implemented using unshift and pop) shows it to be nearly %20 slower than a back-loaded queue (implemented using push and shift); I assume for this very reason.

In reply to Shift, Pop, Unshift and Push with Impunity! by lhoward

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (4)
    As of 2014-07-31 00:11 GMT
    Find Nodes?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:

      Results (241 votes), past polls