Unless you've read a book on the Analysis of Algorithms, or have managed to pick up the
basics along the way, the "O(N)" (AKA "Big O") notation that sometimes gets tossed around
when comparing algorithms might seem obtuse.
But the ideas that the notation expresses can keep you out of trouble,
so it's worth picking up the basics. Here are enough of the informal basics to get
you comfortable when O(N) notation starts getting tossed around informally.
Analyzing algorithms is a separate discussion.
O(N) notation is used to express the worstcase order of growth of an algorithm.
That is, how the algorithm's worstcase performance changes as the size of the data set it operates on increases.
When being informal, people often conflate generalcase with worstcase behavior, though the
more formal folk will throw in some additional symbols to distinguish the two.
Order of growth characterizes the shape of the growth curve, not its slope. That is, order
of growth is one level removed from notions of efficiency. More on this later.
Growth can be in terms of speed or space, but unless people say space, they almost always
mean speed.
Common Orders of Growth
O(1) is the nogrowth curve. An O(1) algorithm's performance is conceptually independent of the size
of the data set on which it operates.
Array element access is O(1), if you ignore implementation details
like virtual memory and page faults.
Ignoring the data set entirely and returning undef is also O(1), though this is rarely useful.
O(N) says that the algorithm's performance is directly proportional to the size of the data
set being processed. Scanning an array or linked list takes O(N) time.
Probing an array is still O(N), even if statistically you only have to scan half the array to find a value.
Because computer scientists are only interested in the shape of the growth curve at this level,
when you see O(2N) or O(10 + 5N), someone is blending implementation details into the conceptual ones.
Depending on the algorithm used, searching a hash is O(N) in the worst case. Insertion is also O(N) in the worst case, but considerably more efficient in the general case.
O(N+M) is just a way of saying that two data sets are involved, and that their combined size
determines performance.
O(N^{2}) says that the algorithm's performance is proportional to the square of the
data set size. This happens when the algorithm processes each element of a set, and that processing
requires another pass through the set. The infamous Bubble Sort is O(N^{2}).
O(N•M) indicates that two data sets are involved, and the processing of each element of one involves
processing the second set. If the two set sizes are roughly equivalent, some people get sloppy and
say O(N^{2}) instead. While technically incorrect, O(N^{2}) still conveys useful information.
"I've got this list of regular expressions, and I need to apply all of them to this chunk of text"
is potentially O(N•M), depending on the regexes.
O(N^{3}) and beyond are what you would expect. Lots of inner loops.
O(2^{N}) means you have an algorithm with exponential time (or space, if someone says space) behavior.
In the 2 case, time or space double for each new element in data set. There's also O(10^{N}), etc.
In practice, you don't need to worry about scalability with exponential algorithms, since you can't scale
very far unless you have a very big hardware budget.
O(log N) and O(N log N) might seem a bit scary, but they're really not.
These generally mean that the algorithm deals with a data set that is iteratively partitioned, like
a balanced binary tree. (Unbalanced binary trees are O(N^{2}) to build, and O(N) to probe.)
Generally, but not always, log N implies log_{2}N,
which means, roughly, the number of times you can partition a set in half, then partition
the halves, and so on, while still having nonempty sets. Think powers of 2, but worked backwards.
2^{10} = 1024
log_{2}1024 = 10
The key thing to note is that log _{2}N grows slowly. Doubling N has a relatively small effect.
Logarithmic curves flatten out nicely.
It takes O(log N) time to probe a balanced binary tree, but building the tree is more expensive.
If you're going to be probing a data set a lot, it pays to take the hit on construction to get fast
probe time.
Quite often, when an algorithm's growth rate is characterized by some mix of orders, the dominant order is shown, and the rest are dropped. O(N^{2}) might really mean O(N^{2} + N).
Scalability and Efficiency
An O(1) algorithm scales better than an O(log N) algorithm,
which scales better than an O(N) algorithm,
which scales better than an O(N log N) algorithm,
which scales better than an O(N^{2}) algorithm,
which scales better than an O(2^{N}) algorithm.
This pops right out when you look at a graph of their growth curves.
But scalability isn't efficiency.
A wellcoded, O(N^{2}) algorithm can outperform a sloppilycoded O(N log N) algorithm,
but only for a while. At some point their performance curves will cross. Hopefully, this
happens before your code goes into production with a customer who is using a lot more data than you
tested with.
Once you have a characterization of your algorithm, which might involve a mixture of the orders shown
above, you're in a position to start plugging in numbers to predict how the algorithm will scale in
your environment, on your hardware, with your data. Keep in mind that algorithms might have a characteristic,
constant startup overhead, and a perstep overhead. So you need to move from
O(N log N)
to
k_{1} + k_{2}(N log N)
and then determine values for the constants by taking some samples an doing some math.
This is left as an exercise for the motivated reader.
Common Pitfalls
By far, the most common pitfall when dealing with algorithmic complexity is the naive belief (or blind hope) that
the algorithm that was used successfully on a small project is going to scale to deal with 10x or 100x the data.
The inverse of this problem is not leaving "good enough" alone.
In a given situation, an O(N^{2}) algorithm will often work just fine.
Giving into the temptation to switch to a more complex O(N log N) algorithm
can needlessly complicate your code. And there's an opportunity cost: the time you spent switching to
a "better" algorithm might better have been applied elsewhere.
A more subtle pitfall is when you've hung on to incomplete knowledge, inadvertently limiting your available choices.
If you ask for a show of hands for who thinks Quicksort (which is O(N log N) is the fastest sort,
you'll see a lot of arms go up.
Refer these people to Knuth's Art of Computer Programming, volume 3: Sorting and Searching,
where they'll find that Radix sort takes O(N) time (but requires O(N) space, and places constraints on keys).
Spending some time with a good Algorithms book can expand your options.
This concludes the informal introduction to the "Big O" notation as it is used informally.
There's a lot more rigor (and more symbols) if you want the full formal treatment.
See Also
Mastering Algorithms with Perl has a brief coverage of O(N) notation on pages 1720, with
some neat graphs, including one that shows how the characteristic order curves differ.
Thanks to tye, BrowserUK, zdog, and AbigailII for reviewing a draft of this post.
The mistakes and sloppiness that remain are mine. And thanks in advance to anyone who can point
out errors or plug in gaps.
Re: An informal introduction to O(N) notation by theorbtwo (Prior) on Jan 18, 2003 at 05:00 UTC 
One thing that you didn't actualy give is a defintion for "order of growth", which would explain why O(1)==O(2), and why O(2N+3) == O(N).
I'd try to give a defintion, but I'd most likely get it wrong... I almost quoted Knuth here, but then I realized that I didn't understand what he meant. (For those that care, see pg 107 of vol 1, 3rd ed, of TAoCP.)
Update: Everything below this point. (I kept reading that section, and didn't want to "waste" another node.)
In purticular, note that O(n) gives an /upper/ bound on the order of growth of a purticular algo, and is by no means exact. There's also Ω(N), which gives the lowerbound.
Additionaly, saying that one algo is O(1), whereas another is O(2^N) does mean that, for a large enough dataset, the first algo will be faster then the second. It does /not/ mean that it will be for any reasonable dataset. For example, if the first algo takes 1 year to complete, but the second takes 2^N nanoseconds, it will take... well, a huge dataset before the first becomes faster.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
 [reply] 

One thing that you didn't actualy give is a defintion for "order of growth", which would explain why O(1)==O(2), and why O(2N+3) == O(N).
From a casual perspective, "order of growth" can be thought of as the shape of the worstcase growth curve, regardless of the slope of that shape. Even it a line indicating linear (i.e., O(N)) growth has a high slope, eventuallywhen the data set is big enoughit will be overtaken by an O(N^{2}) curve. That's one of the pitfalls. People get seduced by the behavior of their code on small data sets, then get blindsided when they try a much larger data set and performance sucks.
 [reply] 

Actually, a year is less than 2^55 nanoseconds. And I wouldn't
call a set with 55 elements "huge". Beware of the power of
exponentation. Your computer needs to speed up with a factor
of 1000 to be able to increase your dataset with no more than
10 so that it will run in the same time....
Abigail
 [reply] 

Sigh. That's what I get for not checking my arithmetic. You're right, of course. I stand corrected on this example, but if you change the example to be O(1) at 1 year vs. O(N^2) at 1ns*N^2, the size of the dataset for the second to become slower becomes a lot larger. (Specificly, around 178 million items.) Also, there's the consideration of O() notation being the worstcase senerio. For example, even though bubblesort is O(N^2), for nearlysorted input, it can be quit efficent.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
 [reply] 
Re: An informal introduction to O(N) notation by BUU (Prior) on Jan 18, 2003 at 05:26 UTC 
O(N2) says that the algorithm's performance is proportional to the square of the data set size. This happens when the algorithm processes each element of a set, and that processing requires another pass through the set. The infamous Bubble Sort is O(N^{2}).
Shouldn't bubble sort be O(N^{N}) as you might have to process the entire thing once for each element, if it was sorted backwords for example?
 [reply] 

So, thats one sort for each element in an n sized list... and if each sort is O(n) for one member, then n*n = ?
 [reply] 
Re: An informal introduction to O(N) notation by jryan (Vicar) on Jan 18, 2003 at 10:04 UTC 
It should be noted that many algorithm's O time can be reduced. For instance, many O(n^2) algorithms can be reduced to O(n log(n)) by finding a way to cut the inner loop short. A famous example is bubblesort:
# O(n^2)
for my $i (0..$#a1) {
for (0..$#a1$i) {
($a[$_],$a[$_+1]) = ($a[$_+1],$a[$_])
if ($a[$_+1] < $a[$_]);
}
}
# O(n log(n))
for my $i (0..$#a1) {
# looping over alreadysorted items is not needed
for (0..$#a1$i) {
($a[$_],$a[$_+1]) = ($a[$_+1],$a[$_])
if ($a[$_+1] < $a[$_]);
}
}
O(n) algorithms that involve a search via a loop can sometimes be
reduced from O(n) to O(log(n)) by breaking out of the loop once a
"result" is found:
# O(n)
my $truestatus;
foreach my $item (@list) {
$truestatus = 1 if (some_condition);
}
# O(log(n))
my $truestatus;
foreach my $item (@list) {
# if some_condition is going to be valid for at least one $item in
# @list, then the limit of the average runtime will approach
# O(log n).
if (some_condition) {
$truestatus = 1;
last
}
}
Of course, the secret is to know where to apply these optimizations, which is often harder than it seems :)
 [reply] [d/l] [select] 

Your bubble sort examples for O(n) and O(n log n) are the same (the first needs to be unoptimized). Also, I believe the optimized bubble sort is O(n^{2}/2), not O(n log n).
For your second suggestion, loops that can break out early are still O(n) in the worst case, as no guarantee can be made that all other elements will not be considered before the target element.
 [reply] 

Sorry, but that's nonsense. If you say that searching for an element in an unsorted list becomes O(log n) if you stop searchng after you've found the element, you actually say that every element in a list is located among it's first log(n) elements, (where n is the length of the list) and everybody knows that the elements of a list reside within the whole list, that's their definition. So your assumption would be true if there existed some "a" and some "n" for which log_{a}n = n, but there is no such a.
Btw.: O(n) does not mean that you don't stop searching a list after you've found the thing you're looking for, of course you do that, but you must expect that you have to search the whole list if the element you're looking for is the last element or not in the list.
But you _can_ make list search O(log n) by keeping the list sorted and then doing binary search.

http://fruiture.de
 [reply] 

 [reply] 

my $tmp;
for my $i (0..$#a1) {
# looping over alreadysorted items is not needed
for (0..$#a1$i) {
if ($a[$_+1] < $a[$_]) {
$tmp=$a[$_];
$a[$_]=$a[$_+1];
$a[$_+1]=$tmp;
}
}
}
Is slightly faster than your version. :) Its funny actually but Knuth discusses this exact issue as a justification as to why he used MIX instead of a high level programming language. People have a tendency to write code in as few characters as possible. (Perl gives an opportunity to take this way beyond reasonable so we can ignore that stuff.) However very often the constructs that take the least space for the human result in the most code being excuted. To take this further for a while I was working on implementing a bunch of the AoP algorithms in perl. Some of them are not feasable as perl doesnt allow low enough operations. I believe that you will find one of the linear search algorthms (you'd never think this would be tricky) is unimplementable in perl to be more efficient than the others (even though in MIX and assembler it is).
Also IIRC the following minor change makes Bubblesort get even faster
my $tmp;
my $swapped;
for my $i (0..$#a1) {
$swapped=0;
# looping over alreadysorted items is not needed
for (0..$#a1$i) {
if ($a[$_+1] < $a[$_]) {
$tmp=$a[$_];
$a[$_]=$a[$_+1];
$a[$_+1]=$tmp;
$swapped=1;
}
}
last unless $swapped;
}
As it allows it to bail once the list is sorted. (Atually you could probably be clever and not use one of the extra variables, although only if undef isnt legal in the data being sorted.)
update fixed typo in the second version. thanks jryan. :)
 demerphq
my friends call me, usually because I'm late....
 [reply] [d/l] [select] 

 [reply] 
Re: An informal introduction to O(N) notation by fruiture (Curate) on Jan 18, 2003 at 12:09 UTC 
I think it's usefull to have a look at a more formal way of saying what the Big O means after having read this informal orientation. In the end, it's not that hard to understand.
You can express the complexity of an algorithm as a complete function that includes constants (startup cost) and constant factors (for example if you iterate over a list thrice), based on the assumption that all primitve actions (+,,/,*,assignation,...) cost 1 "step" of time. Let's call this function f(n,m,...) where n,m,... are all the variable factors that influence the problem. How can we now find a function g(n,m,...) so that O(g(n,m,...)) = f(n,m,...)? Simple
f(n,m,...) = O(g(n,m,...))
means that there exists a constant "c", for which
c * g(n,m,...) >= f(n,m,...)
is true for large enough n,m,...
If you think about this relation it's clear that you can forget all constant factors or constants in your program, because a large enough c will always suffice to make c*g() grow faster. Because O(n^2) = O(n^3) you should always find the smallest g() for which f() = O(g()) is true in order to have a representative function g().
f(n,m,...) = Ω(g(n,m,...)
means that there is ....
c * g(n,m,...) <= f(n,m,...)
This is the lower bound. And finally Θ marks an average "middle":
f(n,m,...) = Θ(g(n,m,...))
means that
f(n,m,...) = O( g(n,m,...) )
and
f(n,m,...) = Ω( g(n,m,...) )

http://fruiture.de  [reply] [d/l] [select] 

Thanks. This was the "more formal, but still readable" defintion of O() that I was looking for.
Also, BTW, while &Theta() is defined as the averagecase, I think the normalcase is more often actualy used  the average case can be impossible (or at least, impraticle) to compute. (Then again, I don't really know what I'm talking about.)
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
 [reply] 

Nitpicking:
'Ο' is used for an estimate in excess (it's supposed to be a capital omicron, not an 'oh')
'Ω' is used for an estimate in defect (omega "is bigger" than omicron)
'Θ' is used for the actual growth.
For example: mergesort is Ω(1), Ο(n^{2}), Θ(n log n)
This is important since it's quite common to know boundaries on the complexity, but not the exact complexity. For example, matrix multiplication is Ω(n^{2}) (you have to generate all the elements), and surely Ο(n^{3}) (there is a naif algorithm with that complexity), but we don't know the exact function.

dakkar  Mobilis in mobile
 [reply] 

While we're nitpicking:
O() defines an upper bound (typically, but not exclusively, on worst case behaviour)
Ω() defines a lower bound (again typically, but not exclusively, on bestcase behaviour)
An algorithm is said to run in Θ(f(n)) if it runs in O(f(n)) and in Ω(f(n)), i.e. the upper and lower bounds are no more than a constant multiple of each other. It is a tighter definition than averagecase execution complexity which depends on first defining the properties of an "average" input. I think what you're trying to define above is averagecase execution complexity  by definition of the Θ() notation your statement above cannot be true.
BTW mergesort is O(n lg n).
 Menahem
 [reply] 
Re: An informal introduction to O(N) notation by pg (Canon) on Jan 18, 2003 at 23:05 UTC 
 As hash is so heavily used in Perl, it is worth to explain why hash search is O(n).
One runs into the worst case of a hash search when all the elements calculate to the same hash key. In this case, a hash becomes no more than a onedimentional chain of elements.
Even worse, the element one trying to look up happens to be the last element in the sequence. Now the searching engine has to go throguh the entire chain, all n elements, to find what one wants.
 (In rest of the post, I will just use small o instead of big O, as now I am more focused on complexity, doesn't matter whether it is for worst case, best case or whatever. Big O is just a notion saying that it is for the worst case.)
In an average case, if n is the number of elements we have in a hash, and k is the number of all possible hash key values. Ideally all elements would spread nearly even among possible hash keys, so the chain length under each hash key is n/k. Averagely you need to go though half of the chain to get what you want, thus you need to go thru n/2k elements.
So averagely a hash search would close to o(n/2k) (be very careful, n is a variable when k is a constant, this is serious), which ~ o(n).
How come the average case is the same as the worst case, NO they are not the same, but they are ~.
 Some time it is seriously dangerous to casually simplify things that is seriously complex.
o(n) is not o(n/2k), but o(n) ~ o(n/2k) (again, n is variable, and k is constant, this is very serious), the easiest way to explain ~, yet does not lose seriousness too much is that: the speed o(n) and o(n/2k) approach infinite is the same.
Although o(n) ~ o(n/10^100), what takes o(n/10^100) algorithm 1 sec, might take a o(n) alogorithm 10^100 sec to finish. They are far from the same.
 [reply] 

Thank you. This explains everything I was attempting to, but does it lucidly. Moreover, it goes in to a lot of things I didn't know about.
Only one thing I have to add: You can find the number of items in a hash with scalar keys %hash, and the number of used/total buckets with ($used_buckets, $total_buckets) = split '/', scalar %hash. With those numbers in hand, you should be able to get more exact timings, or demonstrate to yourself that what pg says is true.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
 [reply] [d/l] [select] 

Most often k is not a constant, but a function of n. If for example k would be ~ c * n then the average length of the lists of items in each bucket would be ~ n / (c * n) which is ~ 1 / c. Which is a constant.
Therefore the key lookup of the usual implementations of hashes has average complexity of o(1).
Of course this implementation forces you to recreate the hash when the number of keys grows over some limit which can be time consuming.
Jenda
P.S.: I just looked at the number of buckets in the Perl hashes (I iteratively added elements, watched the number of buckets and printed the number of elements and buckets after every rehashing). This is the results:
1 => 8
9 => 16
18 => 32
34 => 64
70 => 128
130 => 256
258 => 512
514 => 1024
1024 => 2048
2052 => 4096
4098 => 8192
8193 => 16384
16386 => 32768
32768 => 65536
65538 => 131072
Perl 5.8.0 built for MSWin32x86multithread
 [reply] [d/l] [select] 
Re: An informal introduction to O(N) notation by AbigailII (Bishop) on Jan 19, 2003 at 23:30 UTC 
I've several things to say about this  but I won't have
time to write a reply until a week from now.
Abigail  [reply] 

 [reply] 
Re: An informal introduction to O(N) notation by FoxtrotUniform (Prior) on Feb 08, 2003 at 23:26 UTC 
The infamous Bubble Sort is O(N^{2}).
So is quicksort, in the worst case. :) Mergesort and heapsort are both O(N log N) in the worst case.
Most of the time, there's no difference in asymptotic order between the average and worstcase inputs, but every once in a while, something (like quicksort) comes along and bites you on the ass on certain inputs. Quicksort expects its input to be mostly random; ordered input will mess it up. (Think about building a binary tree from ordered input: you'll get a linear list, which is O(N) nodes deep, not a nice bushy tree, which would be O(log N) nodes deep.) Something to think about when building divide and conquer algorithms.

F
o
x
t
r
o
t
U
n
i
f
o
r
m
Found a typo in this node? /msg me
The hell with paco, vote for Erudil!
 [reply] 

