Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

An informal introduction to O(N) notation

by dws (Chancellor)
on Jan 18, 2003 at 03:35 UTC ( [id://227909]=perlmeditation: print w/replies, xml ) Need Help??

Unless you've read a book on the Analysis of Algorithms, or have managed to pick up the basics along the way, the "O(N)" (AKA "Big O") notation that sometimes gets tossed around when comparing algorithms might seem obtuse. But the ideas that the notation expresses can keep you out of trouble, so it's worth picking up the basics. Here are enough of the informal basics to get you comfortable when O(N) notation starts getting tossed around informally. Analyzing algorithms is a separate discussion.

O(N) notation is used to express the worst-case order of growth of an algorithm. That is, how the algorithm's worst-case performance changes as the size of the data set it operates on increases.

When being informal, people often conflate general-case with worst-case behavior, though the more formal folk will throw in some additional symbols to distinguish the two. Order of growth characterizes the shape of the growth curve, not its slope. That is, order of growth is one level removed from notions of efficiency. More on this later. Growth can be in terms of speed or space, but unless people say space, they almost always mean speed.

Common Orders of Growth

O(1) is the no-growth curve. An O(1) algorithm's performance is conceptually independent of the size of the data set on which it operates. Array element access is O(1), if you ignore implementation details like virtual memory and page faults. Ignoring the data set entirely and returning undef is also O(1), though this is rarely useful.

O(N) says that the algorithm's performance is directly proportional to the size of the data set being processed. Scanning an array or linked list takes O(N) time. Probing an array is still O(N), even if statistically you only have to scan half the array to find a value. Because computer scientists are only interested in the shape of the growth curve at this level, when you see O(2N) or O(10 + 5N), someone is blending implementation details into the conceptual ones.

Depending on the algorithm used, searching a hash is O(N) in the worst case. Insertion is also O(N) in the worst case, but considerably more efficient in the general case.

O(N+M) is just a way of saying that two data sets are involved, and that their combined size determines performance.

O(N2) says that the algorithm's performance is proportional to the square of the data set size. This happens when the algorithm processes each element of a set, and that processing requires another pass through the set. The infamous Bubble Sort is O(N2).

O(N•M) indicates that two data sets are involved, and the processing of each element of one involves processing the second set. If the two set sizes are roughly equivalent, some people get sloppy and say O(N2) instead. While technically incorrect, O(N2) still conveys useful information.

"I've got this list of regular expressions, and I need to apply all of them to this chunk of text" is potentially O(N•M), depending on the regexes.

O(N3) and beyond are what you would expect. Lots of inner loops.

O(2N) means you have an algorithm with exponential time (or space, if someone says space) behavior. In the 2 case, time or space double for each new element in data set. There's also O(10N), etc. In practice, you don't need to worry about scalability with exponential algorithms, since you can't scale very far unless you have a very big hardware budget.

O(log N) and O(N log N) might seem a bit scary, but they're really not. These generally mean that the algorithm deals with a data set that is iteratively partitioned, like a balanced binary tree. (Unbalanced binary trees are O(N2) to build, and O(N) to probe.) Generally, but not always, log N implies log2N, which means, roughly, the number of times you can partition a set in half, then partition the halves, and so on, while still having non-empty sets. Think powers of 2, but worked backwards.

210 = 1024
log21024 = 10
The key thing to note is that log2N grows slowly. Doubling N has a relatively small effect. Logarithmic curves flatten out nicely.

It takes O(log N) time to probe a balanced binary tree, but building the tree is more expensive. If you're going to be probing a data set a lot, it pays to take the hit on construction to get fast probe time.

Quite often, when an algorithm's growth rate is characterized by some mix of orders, the dominant order is shown, and the rest are dropped. O(N2) might really mean O(N2 + N).

Scalability and Efficiency

An O(1) algorithm scales better than an O(log N) algorithm,
which scales better than an O(N) algorithm,
which scales better than an O(N log N) algorithm,
which scales better than an O(N2) algorithm,
which scales better than an O(2N) algorithm.

This pops right out when you look at a graph of their growth curves.

But scalability isn't efficiency. A well-coded, O(N2) algorithm can outperform a sloppily-coded O(N log N) algorithm, but only for a while. At some point their performance curves will cross. Hopefully, this happens before your code goes into production with a customer who is using a lot more data than you tested with.

Once you have a characterization of your algorithm, which might involve a mixture of the orders shown above, you're in a position to start plugging in numbers to predict how the algorithm will scale in your environment, on your hardware, with your data. Keep in mind that algorithms might have a characteristic, constant startup overhead, and a per-step overhead. So you need to move from

      O(N log N)
to
      k1 + k2(N log N)

and then determine values for the constants by taking some samples an doing some math. This is left as an exercise for the motivated reader.

Common Pitfalls

By far, the most common pitfall when dealing with algorithmic complexity is the naive belief (or blind hope) that the algorithm that was used successfully on a small project is going to scale to deal with 10x or 100x the data.

The inverse of this problem is not leaving "good enough" alone. In a given situation, an O(N2) algorithm will often work just fine. Giving into the temptation to switch to a more complex O(N log N) algorithm can needlessly complicate your code. And there's an opportunity cost: the time you spent switching to a "better" algorithm might better have been applied elsewhere.

A more subtle pitfall is when you've hung on to incomplete knowledge, inadvertently limiting your available choices. If you ask for a show of hands for who thinks Quicksort (which is O(N log N) is the fastest sort, you'll see a lot of arms go up. Refer these people to Knuth's Art of Computer Programming, volume 3: Sorting and Searching, where they'll find that Radix sort takes O(N) time (but requires O(N) space, and places constraints on keys). Spending some time with a good Algorithms book can expand your options.

This concludes the informal introduction to the "Big O" notation as it is used informally. There's a lot more rigor (and more symbols) if you want the full formal treatment.

See Also

Mastering Algorithms with Perl has a brief coverage of O(N) notation on pages 17-20, with some neat graphs, including one that shows how the characteristic order curves differ.


Thanks to tye, BrowserUK, zdog, and Abigail-II for reviewing a draft of this post. The mistakes and sloppiness that remain are mine. And thanks in advance to anyone who can point out errors or plug in gaps.

Replies are listed 'Best First'.
Re: An informal introduction to O(N) notation
by theorbtwo (Prior) on Jan 18, 2003 at 05:00 UTC

    One thing that you didn't actualy give is a defintion for "order of growth", which would explain why O(1)==O(2), and why O(2N+3) == O(N).

    I'd try to give a defintion, but I'd most likely get it wrong... I almost quoted Knuth here, but then I realized that I didn't understand what he meant. (For those that care, see pg 107 of vol 1, 3rd ed, of TAoCP.)

    Update: Everything below this point. (I kept reading that section, and didn't want to "waste" another node.)

    In purticular, note that O(n) gives an /upper/ bound on the order of growth of a purticular algo, and is by no means exact. There's also Ω(N), which gives the lower-bound.

    Additionaly, saying that one algo is O(1), whereas another is O(2^N) does mean that, for a large enough dataset, the first algo will be faster then the second. It does /not/ mean that it will be for any reasonable dataset. For example, if the first algo takes 1 year to complete, but the second takes 2^N nanoseconds, it will take... well, a huge dataset before the first becomes faster.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      One thing that you didn't actualy give is a defintion for "order of growth", which would explain why O(1)==O(2), and why O(2N+3) == O(N).

      From a casual perspective, "order of growth" can be thought of as the shape of the worst-case growth curve, regardless of the slope of that shape. Even it a line indicating linear (i.e., O(N)) growth has a high slope, eventually--when the data set is big enough--it will be overtaken by an O(N2) curve. That's one of the pitfalls. People get seduced by the behavior of their code on small data sets, then get blindsided when they try a much larger data set and performance sucks.

      Actually, a year is less than 2^55 nanoseconds. And I wouldn't call a set with 55 elements "huge". Beware of the power of exponentation. Your computer needs to speed up with a factor of 1000 to be able to increase your dataset with no more than 10 so that it will run in the same time....

      Abigail

        Sigh. That's what I get for not checking my arithmetic. You're right, of course. I stand corrected on this example, but if you change the example to be O(1) at 1 year vs. O(N^2) at 1ns*N^2, the size of the dataset for the second to become slower becomes a lot larger. (Specificly, around 178 million items.) Also, there's the consideration of O() notation being the worst-case senerio. For example, even though bubble-sort is O(N^2), for nearly-sorted input, it can be quit efficent.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: An informal introduction to O(N) notation
by fruiture (Curate) on Jan 18, 2003 at 12:09 UTC

    I think it's usefull to have a look at a more formal way of saying what the Big O means after having read this informal orientation. In the end, it's not that hard to understand.

    You can express the complexity of an algorithm as a complete function that includes constants (startup cost) and constant factors (for example if you iterate over a list thrice), based on the assumption that all primitve actions (+,-,/,*,assignation,...) cost 1 "step" of time. Let's call this function f(n,m,...) where n,m,... are all the variable factors that influence the problem. How can we now find a function g(n,m,...) so that O(g(n,m,...)) = f(n,m,...)? Simple

    f(n,m,...) = O(g(n,m,...)) means that there exists a constant "c", for which c * g(n,m,...) >= f(n,m,...) is true for large enough n,m,...

    If you think about this relation it's clear that you can forget all constant factors or constants in your program, because a large enough c will always suffice to make c*g() grow faster. Because O(n^2) = O(n^3) you should always find the smallest g() for which f() = O(g()) is true in order to have a representative function g().

    f(n,m,...) = &Omega;(g(n,m,...) means that there is .... c * g(n,m,...) <= f(n,m,...)

    This is the lower bound. And finally Θ marks an average "middle":

    f(n,m,...) = &Theta;(g(n,m,...)) means that f(n,m,...) = O( g(n,m,...) ) and f(n,m,...) = &Omega;( g(n,m,...) )
    --
    http://fruiture.de

      Nitpicking:

      'Ο' is used for an estimate in excess (it's supposed to be a capital omicron, not an 'oh')

      'Ω' is used for an estimate in defect (o-mega "is bigger" than o-micron)

      'Θ' is used for the actual growth.

      For example: mergesort is Ω(1), Ο(n2), Θ(n log n)

      This is important since it's quite common to know boundaries on the complexity, but not the exact complexity. For example, matrix multiplication is Ω(n2) (you have to generate all the elements), and surely Ο(n3) (there is a naif algorithm with that complexity), but we don't know the exact function.

      -- 
              dakkar - Mobilis in mobile
      
        While we're nitpicking: O() defines an upper bound (typically, but not exclusively, on worst case behaviour) Ω() defines a lower bound (again typically, but not exclusively, on best-case behaviour) An algorithm is said to run in Θ(f(n)) if it runs in O(f(n)) and in Ω(f(n)), i.e. the upper and lower bounds are no more than a constant multiple of each other. It is a tighter definition than average-case execution complexity which depends on first defining the properties of an "average" input. I think what you're trying to define above is average-case execution complexity - by definition of the Θ() notation your statement above cannot be true. BTW mergesort is O(n lg n). - Menahem

      Thanks. This was the "more formal, but still readable" defintion of O() that I was looking for.

      Also, BTW, while &Theta() is defined as the average-case, I think the normal-case is more often actualy used -- the average case can be impossible (or at least, impraticle) to compute. (Then again, I don't really know what I'm talking about.)


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: An informal introduction to O(N) notation
by pg (Canon) on Jan 18, 2003 at 23:05 UTC
    1. As hash is so heavily used in Perl, it is worth to explain why hash search is O(n).

      One runs into the worst case of a hash search when all the elements calculate to the same hash key. In this case, a hash becomes no more than a one-dimentional chain of elements.

      Even worse, the element one trying to look up happens to be the last element in the sequence. Now the searching engine has to go throguh the entire chain, all n elements, to find what one wants.

    2. (In rest of the post, I will just use small o instead of big O, as now I am more focused on complexity, doesn't matter whether it is for worst case, best case or whatever. Big O is just a notion saying that it is for the worst case.)

      In an average case, if n is the number of elements we have in a hash, and k is the number of all possible hash key values. Ideally all elements would spread nearly even among possible hash keys, so the chain length under each hash key is n/k. Averagely you need to go though half of the chain to get what you want, thus you need to go thru n/2k elements.

      So averagely a hash search would close to o(n/2k) (be very careful, n is a variable when k is a constant, this is serious), which ~ o(n).

      How come the average case is the same as the worst case, NO they are not the same, but they are ~.

    3. Some time it is seriously dangerous to casually simplify things that is seriously complex.

      o(n) is not o(n/2k), but o(n) ~ o(n/2k) (again, n is variable, and k is constant, this is very serious), the easiest way to explain ~, yet does not lose seriousness too much is that: the speed o(n) and o(n/2k) approach infinite is the same.

      Although o(n) ~ o(n/10^100), what takes o(n/10^100) algorithm 1 sec, might take a o(n) alogorithm 10^100 sec to finish. They are far from the same.

      Most often k is not a constant, but a function of n. If for example k would be ~ c * n then the average length of the lists of items in each bucket would be ~ n / (c * n) which is ~ 1 / c. Which is a constant.

      Therefore the key lookup of the usual implementations of hashes has average complexity of o(1).

      Of course this implementation forces you to recreate the hash when the number of keys grows over some limit which can be time consuming.

      Jenda

      P.S.: I just looked at the number of buckets in the Perl hashes (I iteratively added elements, watched the number of buckets and printed the number of elements and buckets after every rehashing). This is the results:

      1 => 8 9 => 16 18 => 32 34 => 64 70 => 128 130 => 256 258 => 512 514 => 1024 1024 => 2048 2052 => 4096 4098 => 8192 8193 => 16384 16386 => 32768 32768 => 65536 65538 => 131072 Perl 5.8.0 built for MSWin32-x86-multi-thread

      Thank you. This explains everything I was attempting to, but does it lucidly. Moreover, it goes in to a lot of things I didn't know about.

      Only one thing I have to add: You can find the number of items in a hash with scalar keys %hash, and the number of used/total buckets with ($used_buckets, $total_buckets) = split '/', scalar %hash. With those numbers in hand, you should be able to get more exact timings, or demonstrate to yourself that what pg says is true.


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: An informal introduction to O(N) notation
by jryan (Vicar) on Jan 18, 2003 at 10:04 UTC

    It should be noted that many algorithm's O time can be reduced. For instance, many O(n^2) algorithms can be reduced to O(n log(n)) by finding a way to cut the inner loop short. A famous example is bubblesort:

    # O(n^2) for my $i (0..$#a-1) { for (0..$#a-1-$i) { ($a[$_],$a[$_+1]) = ($a[$_+1],$a[$_]) if ($a[$_+1] < $a[$_]); } } # O(n log(n)) for my $i (0..$#a-1) { # looping over already-sorted items is not needed for (0..$#a-1-$i) { ($a[$_],$a[$_+1]) = ($a[$_+1],$a[$_]) if ($a[$_+1] < $a[$_]); } }

    O(n) algorithms that involve a search via a loop can sometimes be reduced from O(n) to O(log(n)) by breaking out of the loop once a "result" is found:

    # O(n) my $truestatus; foreach my $item (@list) { $truestatus = 1 if (some_condition); } # O(log(n)) my $truestatus; foreach my $item (@list) { # if some_condition is going to be valid for at least one $item in # @list, then the limit of the average runtime will approach # O(log n). if (some_condition) { $truestatus = 1; last } }

    Of course, the secret is to know where to apply these optimizations, which is often harder than it seems :)

      Sorry, but that's nonsense. If you say that searching for an element in an unsorted list becomes O(log n) if you stop searchng after you've found the element, you actually say that every element in a list is located among it's first log(n) elements, (where n is the length of the list) and everybody knows that the elements of a list reside within the whole list, that's their definition. So your assumption would be true if there existed some "a" and some "n" for which logan = n, but there is no such a.

      Btw.: O(n) does not mean that you don't stop searching a list after you've found the thing you're looking for, of course you do that, but you must expect that you have to search the whole list if the element you're looking for is the last element or not in the list.

      But you _can_ make list search O(log n) by keeping the list sorted and then doing binary search.

      --
      http://fruiture.de

      Your bubble sort examples for O(n) and O(n log n) are the same (the first needs to be un-optimized). Also, I believe the optimized bubble sort is O(n2/2), not O(n log n).

      For your second suggestion, loops that can break out early are still O(n) in the worst case, as no guarantee can be made that all other elements will not be considered before the target element.

      It should be noted that many algorithm's O time can be reduced.

      You may be getting "best case" and "worst case" mixed up. Clever short-circuits in implementations of algorithms often increase the best case, but only if the data behaves certain ways. There are usually mixes of data that defeat the short-cuts. O() is about worst-case behavior.

      It should be noted that many algorithm's O time can be reduced.

      True. But then it isnt the same algorithm. :-) Also care in implementation can make one version of an algorithm outperform another. For instance I think you will find that

      my $tmp; for my $i (0..$#a-1) { # looping over already-sorted items is not needed for (0..$#a-1-$i) { if ($a[$_+1] < $a[$_]) { $tmp=$a[$_]; $a[$_]=$a[$_+1]; $a[$_+1]=$tmp; } } }
      Is slightly faster than your version. :-) Its funny actually but Knuth discusses this exact issue as a justification as to why he used MIX instead of a high level programming language. People have a tendency to write code in as few characters as possible. (Perl gives an opportunity to take this way beyond reasonable so we can ignore that stuff.) However very often the constructs that take the least space for the human result in the most code being excuted. To take this further for a while I was working on implementing a bunch of the AoP algorithms in perl. Some of them are not feasable as perl doesnt allow low enough operations. I believe that you will find one of the linear search algorthms (you'd never think this would be tricky) is unimplementable in perl to be more efficient than the others (even though in MIX and assembler it is).

      Also IIRC the following minor change makes Bubblesort get even faster

      my $tmp; my $swapped; for my $i (0..$#a-1) { $swapped=0; # looping over already-sorted items is not needed for (0..$#a-1-$i) { if ($a[$_+1] < $a[$_]) { $tmp=$a[$_]; $a[$_]=$a[$_+1]; $a[$_+1]=$tmp; $swapped=1; } } last unless $swapped; }
      As it allows it to bail once the list is sorted. (Atually you could probably be clever and not use one of the extra variables, although only if undef isnt legal in the data being sorted.)

      update fixed typo in the second version. thanks jryan. :-)

      --- demerphq
      my friends call me, usually because I'm late....

Re: An informal introduction to O(N) notation
by FoxtrotUniform (Prior) on Feb 08, 2003 at 23:26 UTC
      The infamous Bubble Sort is O(N2).

    So is quicksort, in the worst case. :-) Mergesort and heapsort are both O(N log N) in the worst case.

    Most of the time, there's no difference in asymptotic order between the average- and worst-case inputs, but every once in a while, something (like quicksort) comes along and bites you on the ass on certain inputs. Quicksort expects its input to be mostly random; ordered input will mess it up. (Think about building a binary tree from ordered input: you'll get a linear list, which is O(N) nodes deep, not a nice bushy tree, which would be O(log N) nodes deep.) Something to think about when building divide and conquer algorithms.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

Re: An informal introduction to O(N) notation
by BUU (Prior) on Jan 18, 2003 at 05:26 UTC

    O(N2) says that the algorithm's performance is proportional to the square of the data set size. This happens when the algorithm processes each element of a set, and that processing requires another pass through the set. The infamous Bubble Sort is O(N2).
    Shouldn't bubble sort be O(NN) as you might have to process the entire thing once for each element, if it was sorted backwords for example?
      So, thats one sort for each element in an n sized list... and if each sort is O(n) for one member, then n*n = ?
Re: An informal introduction to O(N) notation
by Abigail-II (Bishop) on Jan 19, 2003 at 23:30 UTC
    I've several things to say about this - but I won't have time to write a reply until a week from now.

    Abigail

      It's been a while; care to revisit this thread? I'm still interested in reading your comments.

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://227909]
Approved by djantzen
Front-paged by djantzen
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-03-19 02:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found