I come from a CS background. I've had *algorithmic
efficiency* beat into me till it was coming out of my
ears. After so many classes on data-structures and algorithm
analysis I've become wary of built-in, higher-level
operators like Perl's **push**, **pop**, **shift**
and **unshift** list manipulation functions. Oh I've used them
plenty, but there has always been this nagging little voice in the back
of my head saying *What if unshift takes longer to
run with a larger list?* and

Questions like these led me on a spiritual quest through the bowels of Perl's source code.... The details of my journey follow.

Now before I dig too deep, I want to introduce the
uninitiated to big-O notation.
Big-O notation if a method of describing the
efficiency of an algorithm as it relates in a *very* general way
to the quantity of data the algorithm is operating on.
Urri Guttman and Larry Rossler's article A
Fresh Look at Efficient Perl Sorting (in addition to being a
great article in its own rite) has a fantastic overview of big-O notation
much better than what I have cobbled together here.
Some common big-O
analyses are as follows (in order from best
to worst performance-wise).

- O(1) - algorithm takes a constant amount of time to run no matter what the dataset size. A good example of this is a hash table lookup.
- O(log(n)) - algorithm performance is proportional to the logarithm of the dataset size. A binary search is O(log(n)).
- O(n) - algorithm runtime scales linearly with dataset size. If it took 10 seconds with 200 elements, it will take 20 seconds with 400 elements. Scanning a list for an element with a particular value is O(n).
- O(n*log(n)) - the best sorting algorithms (quicksort, mergesort) take this long to run.
- O(n^2) - algorithm performance is proportional to the dataset size squared. Basic sorting algorithms (selection sort, insertion sort, bubble sort) take this long to run.
- O(2^n) - only the slowest algorithms take this long to run. Traveling salesman problem, etc.

A straightforward list implementation that every introductory CS
student learns uses an array and an offset to the last in-use array
element to represent a list. This implementation works well for push, pop,
determining number of array elements, insertion, fetching, etc.
However, performance of shift and unshift with this implementation is
horrible: **O(n)**! This is because there is no room at the beginning of
the array and all the elements have to be shifted by one to insert/remove
an element from the front of the list.

There are many other ways to implement lists (some better, some worse, others with different performance tradeoffs), but without knowing which implementation Perl used (or at least seeing some notes in perlfunc or some other piece of perl documentation about the performance of these list manipulation functions) I was having trouble sleeping at night.

Thanks to some source-code browsing and the *Perl Internals*
chapter of **Advanced Perl Programming** I am now happy and content to use
all of Perl's built-in list operators without any loss of sleep.

Perl implements lists with an array and first/last element offsets. The array is allocated larger than needed with the offsets originally pointing in the middle of the array so there is room to grow in both directions (unshifts and pushes/inserts) before a re-allocation of the underlying array is necessary. The consequence of this implementation is that all of perl's primitive list operators (insertion, fetching, determining array size, push, pop, shift, unshift, etc.) perform in O(1) time.

There are some worst-case scenarios of push, unshift and insertion where performance is O(n). These occur when the array upon which the list is implemented has to be reallocated because the operation would access the array beyond its preallocated size. This reallocation overhead is a common factor in all list implementations that handle all the primitive operators in O(1) time.

One consequence of perl's list implementation is that queues implemented using perl
lists end up "creeping forward" through the preallocated array space leading to
reallocations even though the queue itself may never contain many elements.
In comparison, a stack implemented with a perl list will
only require reallocations as the list grows larger.
However, perl is smartly coded because the use of lists
as queues was anticipated. Consequently, these *queue-type reallocations*
have a negligible impact on performance. In benchmarked tests, queue access
of a list (using repeated push/shift operations) is nearly as fast as stack
access to a list (using repeated push/pop operations).

The preallocated array is balanced toward the 0 end of the array (meaning there is more free space at the far end of the list than there is before the list's 0 element). This is done purposely to make pushes more efficient than unshifts (since perl programmers use push much more than they use unshift). As a consequence, a given number of unshifts will result in more reallocations than the same number of pushes. My benchmarking of a front-loaded queue (implemented using unshift and pop) shows it to be nearly %20 slower than a back-loaded queue (implemented using push and shift); I assume for this very reason.

Back to
Meditations

Comment onShift, Pop, Unshift and Push with Impunity!