<?xml version="1.0" encoding="windows-1252"?>
<node id="326515" title="Re: Shift, Pop, Unshift and Push with Impunity!" created="2004-02-04 11:21:43" updated="2005-07-27 07:51:01">
<type id="11">
note</type>
<author id="230012">
jonadab</author>
<data>
<field name="doctext">
&lt;p&gt;&lt;a href="http://mail.python.org/pipermail/python-list/2003-April/158521.html"&gt;David
   Eppstein&lt;/a&gt; claims that it is possible to solve 
   the TSP in O(n*2^n) time, which would simultaneously
   be worse than O(2^n) and yet much better than O(n!).
   In any event, these are clearly the sorts of algorithms
   we want to avoid.  I'm going to go off on a qualifying
   tangent and come back to explain why algorithms worse
   than about O(n^2) are bad news, even for those of us
   who will happily implement an O(n) algorithm without
   worrying about whether it's possible to do it in O(1).
&lt;/p&gt;
&lt;p&gt;I'm not generally terribly concerned about algorithm
   efficiency most of the time, for several reasons.
   Perhaps most important, I'm more concerned with
   maintainability; I'd usually rather have a function
   that runs in half a second and is clear, versus
   one that runs in 3 picoseconds and frightens people
   who look at the source.  Since Perl provides
   built-in functions for sort, push, pop, shift, and
   unshift, I would never think about coding my own
   replacements for them.  That would be inane, a
   huge waste of time.  It would be inane and a huge
   waste of time *even* if I could slightly improve
   on the efficiency (which, of course, I can't),
   unless I were writing the replacement in C to be
   included into Perl (which, if you know anything
   about my relationship with C, is not going to
   happen).
&lt;/p&gt;

&lt;p&gt;The CPU meter on my Gnome panel right now shows a big
   blank area, with little dots along the bottom edge
   denoting small amounts of CPU activity.  I keep the
   CPU meter there not because I normally need it, but
   because if a process happens to run away I want to
   know.  The astute will note that this implies I
   might not even *notice* a runaway process using all
   my CPU time without seeing it on the meter.  This is
   because I'm sitting here using a 1.something GHz
   Celeron system to do mostly the same sorts of things 
   (web browsing, word processing, text editing, ...) 
   that were possible to do on a 486.  The apps I use 
   today have a lot more features, so they're bigger, 
   so they consume a lot more RAM -- but not a lot more
   CPU time.  (Somewhat more, yes; not enough more to
   keep up with the preposterously high clock speeds
   on the market today.)  Consequently, in terms of
   user-visible delays and perceived performance, with
   the normal sizes of datasets that I use on a regular
   basis, an O(n) algorithm that can run entirely
   out of RAM is quite possibly faster than an O(1)
   algorithm that has to read something from disk
   (assuming the something isn't cached).  Certainly,
   for any normal-sized dataset (by which I mean,
   some amount of data a normal person is likely to
   have on a desktop system), O(n) is instant, and
   even O(n^2) is unlikely to cause a user-perceptible
   delay in most cases.  Yes, some applications have
   to be designed to deal with larger amounts of data;
   if RDBMS lookups were O(n^2), that would be a
   problem in serverspace.  I've also run into quite
   perceptible delays when handling my email, since
   I get quite a lot of email, and so I suppose that
   an algorithmic improvement there would be fairly
   worthwhile for me.  These are just examples.  My
   point was, though, that for many situations an
   O(n^2) algorithm may not be noticed by the user;
   it's probably not optimal, but it may pass as
   "good enough" in some situations.
&lt;/p&gt;

&lt;p&gt;A point comes, though, when you have to draw a line
   and say, "this algorithm is too slow".  I personally 
   draw that line after O(n^2).  Anything slower than 
   that, I call too slow.
&lt;/p&gt;
&lt;p&gt;
   Take a dataset size of 200, for example,
   a nice medium-sized dataset.  (Maybe it's the number
   of files in a directory, or something.)  O(n^2)
   comes to 40000 somethings; if we say that's tenths
   of milliseconds (in reality, it would be less than
   that on modern hardware, but go with me here), the
   operation completes in 4 seconds, which for something
   that doesn't happen very often (especially, something
   that only happens when the user deliberately orders 
   it, such as starting an app) is reasonable.  Now, 
   let's say we did the same thing in O(2^n) time.  
   It would take aeons.  Lots of aeons.  I'd say
   millenia, but that would be underestimating it.
   You can see the problem.  Even if we get faster
   hardware so that the 4-second operation completes
   in 4 nanoseconds, the O(2^n) algorithm is still
   taking multiple lifetimes.  O(n!) is even worse.
&lt;/p&gt;

&lt;p&gt;So that's why even people who don't generally care
   about efficiency still have to think about algorithm
   efficiency just enough to avoid the really *bad*
   algorithms.  With O(n^2), the size of the data set
   could blow up to 1000 (quite a lot of somethings,
   for a desktop user), and we'd still be running
   in under two minutes (on the same scale that takes
   4 seconds for n=200).  Two minutes may be a minute
   and fifty seconds too long, but the operation will
   complete.  Start messing around with O(2^n) or O(n!)
   algorithms and you could be waiting until you die.
&lt;/p&gt;

&lt;p&gt;Note that I'm not saying O(n^2) is "fast enough" for
   all purposes.  (When n can blow up to hundreds of
   thousands, it gets to be a very significant 
   nuisance (though it will still complete within your
   lifetime, and quite possibly within your computer's
   lifetime even, and if you sic a good-sized cluster 
   on it you can brute force it in days).)
   What I am saying is that O(n^2) is "fast enough" for
   situations where brute force will do, 
   but some algorithms are too slow even for that.
&lt;/p&gt;

&lt;div class="pmsig"&gt;
&lt;div class="pmsig-230012"&gt;
&lt;hr&gt;&lt;/hr&gt;
&lt;code&gt;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b-&gt;()}}
split//,".rekcah lreP rehtona tsuJ";$\=$ ;-&gt;();print$/&lt;/code&gt;
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
17890</field>
<field name="parent_node">
119536</field>
</data>
</node>
