perlmeditation
dws
Unless you've read a book on the Analysis of Algorithms, or have managed to pick up the
basics along the way, the "O(N)" (AKA "Big O") notation that sometimes gets tossed around
when comparing algorithms might seem obtuse.
But the ideas that the notation expresses can keep you out of trouble,
so it's worth picking up the basics. Here are enough of the informal basics to get
you comfortable when O(N) notation starts getting tossed around informally.
Analyzing algorithms is a separate discussion.
<p>
O(N) notation is used to express the worst-case <i>order of growth</i> of an algorithm.
That is, how the algorithm's worst-case performance changes as the size of the data set it operates on increases.
<readmore>
When being informal, people often conflate general-case with worst-case behavior, though the
more formal folk will throw in some additional symbols to distinguish the two.
Order of growth characterizes the shape of the growth curve, not its slope. That is, order
of growth is one level removed from notions of efficiency. More on this later.
Growth can be in terms of speed or space, but unless people say space, they almost always
mean speed.
<p>
<b>Common Orders of Growth</b>
<p>
<b>O(1)</b> is the no-growth curve. An O(1) algorithm's performance is conceptually independent of the size
of the data set on which it operates.
Array element access is O(1), if you ignore implementation details
like virtual memory and page faults.
Ignoring the data set entirely and returning <code>undef</code> is also O(1), though this is rarely useful.
<p>
<b>O(N)</b> says that the algorithm's performance is directly proportional to the size of the data
set being processed. Scanning an array or linked list takes O(N) time.
Probing an array is still O(N), even if statistically you only have to scan half the array to find a value.
Because computer scientists are only interested in the shape of the growth curve at this level,
when you see O(2N) or O(10 + 5N), someone is blending implementation details into the conceptual ones.
<p>
Depending on the algorithm used, searching a hash is O(N) in the worst case. Insertion is also O(N) in the worst case, but considerably more efficient in the general case.
<p>
<b>O(N+M)</b> is just a way of saying that two data sets are involved, and that their combined size
determines performance.
<p>
<b>O(N<sup>2</sup>)</b> says that the algorithm's performance is proportional to the square of the
data set size. This happens when the algorithm processes each element of a set, and that processing
requires another pass through the set. The infamous Bubble Sort is O(N<sup>2</sup>).
<p>
<b>O(N•M)</b> indicates that two data sets are involved, and the processing of each element of one involves
processing the second set. If the two set sizes are roughly equivalent, some people get sloppy and
say O(N<sup>2</sup>) instead. While technically incorrect, O(N<sup>2</sup>) still conveys useful information.
<p>
"I've got this list of regular expressions, and I need to apply all of them to this chunk of text"
is potentially O(N•M), depending on the regexes.
<p>
<b>O(N<sup>3</sup>)</b> and beyond are what you would expect. Lots of inner loops.
<p>
<b>O(2<sup>N</sup>)</b> means you have an algorithm with exponential time (or space, if someone says space) behavior.
In the 2 case, time or space double for each new element in data set. There's also O(10<sup>N</sup>), etc.
In practice, you don't need to worry about scalability with exponential algorithms, since you can't scale
very far unless you have a very big hardware budget.
<p>
<b>O(log N)</b> and <b>O(N log N)</b> might seem a bit scary, but they're really not.
These generally mean that the algorithm deals with a data set that is iteratively partitioned, like
a balanced binary tree. (Unbalanced binary trees are O(N<sup>2</sup>) to build, and O(N) to probe.)
Generally, but not always, log N implies log<sub>2</sub>N,
which means, roughly, the number of times you can partition a set in half, then partition
the halves, and so on, while still having non-empty sets. Think powers of 2, but worked backwards.
<blockquote>
2<sup><b>10</b></sup> = 1024<br/>
log<sub>2</sub>1024 = <b>10</b>
</blockquote>
The key thing to note is that log<sub>2</sub>N grows slowly. Doubling N has a relatively small effect.
Logarithmic curves flatten out nicely.
<p>
It takes O(log N) time to probe a balanced binary tree, but building the tree is more expensive.
If you're going to be probing a data set a lot, it pays to take the hit on construction to get fast
probe time.
<p>
Quite often, when an algorithm's growth rate is characterized by some mix of orders, the dominant order is shown, and the rest are dropped. O(N<sup>2</sup>) might really mean O(N<sup>2</sup> + N).
<p>
<b>Scalability and Efficiency</b>
<p>
An O(1) algorithm scales better than an O(log N) algorithm,<br/>
which scales better than an O(N) algorithm,<br/>
which scales better than an O(N log N) algorithm,<br/>
which scales better than an O(N<sup>2</sup>) algorithm,<br/>
which scales better than an O(2<sup>N</sup>) algorithm.
<p>
This pops right out when you look at a graph of their growth curves.
<p>
But scalability isn't efficiency.
A well-coded, O(N<sup>2</sup>) algorithm can outperform a sloppily-coded O(N log N) algorithm,
but only for a while. At some point their performance curves will cross. Hopefully, this
happens before your code goes into production with a customer who is using a lot more data than you
tested with.
<p>
Once you have a characterization of your algorithm, which might involve a mixture of the orders shown
above, you're in a position to start plugging in numbers to predict how the algorithm will scale in
your environment, on your hardware, with your data. Keep in mind that algorithms might have a characteristic,
constant startup overhead, and a per-step overhead. So you need to move from
<p>
O(N log N)<br/>
to<br/>
k<sub>1</sub> + k<sub>2</sub>(N log N)
<p>
and then determine values for the constants by taking some samples an doing some math.
This is left as an exercise for the motivated reader.
<p>
<b>Common Pitfalls</b>
<p>
By far, the most common pitfall when dealing with algorithmic complexity is the naive belief (or blind hope) that
the algorithm that was used successfully on a small project is going to scale to deal with 10x or 100x the data.
<p>
The inverse of this problem is not leaving "good enough" alone.
In a given situation, an O(N<sup>2</sup>) algorithm will often work just fine.
Giving into the temptation to switch to a more complex O(N log N) algorithm
can needlessly complicate your code. And there's an opportunity cost: the time you spent switching to
a "better" algorithm might better have been applied elsewhere.
<p>
A more subtle pitfall is when you've hung on to incomplete knowledge, inadvertently limiting your available choices.
If you ask for a show of hands for who thinks Quicksort (which is O(N log N) is the fastest sort,
you'll see a lot of arms go up.
Refer these people to Knuth's [isbn://0201896850|Art of Computer Programming, volume 3: Sorting and Searching],
where they'll find that Radix sort takes O(N) time (but requires O(N) space, and places constraints on keys).
Spending some time with a good Algorithms book can expand your options.
<p>
This concludes the informal introduction to the "Big O" notation as it is used informally.
There's a lot more rigor (and more symbols) if you want the full formal treatment.
<p>
<b>See Also</b>
<p>
[isbn://1565923987|Mastering Algorithms with Perl] has a brief coverage of O(N) notation on pages 17-20, with
some neat graphs, including one that shows how the characteristic order curves differ.
<p>
<hr>
Thanks to [tye], [BrowserUK], [zdog], and [Abigail-II] for reviewing a draft of this post.
The mistakes and sloppiness that remain are mine. And thanks in advance to anyone who can point
out errors or plug in gaps.
<p>