Perl: the Markov chain saw | |
PerlMonks |
Re: Shift, Pop, Unshift and Push with Impunity!by jonadab (Parson) |
on Feb 04, 2004 at 16:21 UTC ( [id://326515]=note: print w/replies, xml ) | Need Help?? |
David Eppstein claims that it is possible to solve the TSP in O(n*2^n) time, which would simultaneously be worse than O(2^n) and yet much better than O(n!). In any event, these are clearly the sorts of algorithms we want to avoid. I'm going to go off on a qualifying tangent and come back to explain why algorithms worse than about O(n^2) are bad news, even for those of us who will happily implement an O(n) algorithm without worrying about whether it's possible to do it in O(1). I'm not generally terribly concerned about algorithm efficiency most of the time, for several reasons. Perhaps most important, I'm more concerned with maintainability; I'd usually rather have a function that runs in half a second and is clear, versus one that runs in 3 picoseconds and frightens people who look at the source. Since Perl provides built-in functions for sort, push, pop, shift, and unshift, I would never think about coding my own replacements for them. That would be inane, a huge waste of time. It would be inane and a huge waste of time *even* if I could slightly improve on the efficiency (which, of course, I can't), unless I were writing the replacement in C to be included into Perl (which, if you know anything about my relationship with C, is not going to happen). The CPU meter on my Gnome panel right now shows a big blank area, with little dots along the bottom edge denoting small amounts of CPU activity. I keep the CPU meter there not because I normally need it, but because if a process happens to run away I want to know. The astute will note that this implies I might not even *notice* a runaway process using all my CPU time without seeing it on the meter. This is because I'm sitting here using a 1.something GHz Celeron system to do mostly the same sorts of things (web browsing, word processing, text editing, ...) that were possible to do on a 486. The apps I use today have a lot more features, so they're bigger, so they consume a lot more RAM -- but not a lot more CPU time. (Somewhat more, yes; not enough more to keep up with the preposterously high clock speeds on the market today.) Consequently, in terms of user-visible delays and perceived performance, with the normal sizes of datasets that I use on a regular basis, an O(n) algorithm that can run entirely out of RAM is quite possibly faster than an O(1) algorithm that has to read something from disk (assuming the something isn't cached). Certainly, for any normal-sized dataset (by which I mean, some amount of data a normal person is likely to have on a desktop system), O(n) is instant, and even O(n^2) is unlikely to cause a user-perceptible delay in most cases. Yes, some applications have to be designed to deal with larger amounts of data; if RDBMS lookups were O(n^2), that would be a problem in serverspace. I've also run into quite perceptible delays when handling my email, since I get quite a lot of email, and so I suppose that an algorithmic improvement there would be fairly worthwhile for me. These are just examples. My point was, though, that for many situations an O(n^2) algorithm may not be noticed by the user; it's probably not optimal, but it may pass as "good enough" in some situations. A point comes, though, when you have to draw a line and say, "this algorithm is too slow". I personally draw that line after O(n^2). Anything slower than that, I call too slow. Take a dataset size of 200, for example, a nice medium-sized dataset. (Maybe it's the number of files in a directory, or something.) O(n^2) comes to 40000 somethings; if we say that's tenths of milliseconds (in reality, it would be less than that on modern hardware, but go with me here), the operation completes in 4 seconds, which for something that doesn't happen very often (especially, something that only happens when the user deliberately orders it, such as starting an app) is reasonable. Now, let's say we did the same thing in O(2^n) time. It would take aeons. Lots of aeons. I'd say millenia, but that would be underestimating it. You can see the problem. Even if we get faster hardware so that the 4-second operation completes in 4 nanoseconds, the O(2^n) algorithm is still taking multiple lifetimes. O(n!) is even worse. So that's why even people who don't generally care about efficiency still have to think about algorithm efficiency just enough to avoid the really *bad* algorithms. With O(n^2), the size of the data set could blow up to 1000 (quite a lot of somethings, for a desktop user), and we'd still be running in under two minutes (on the same scale that takes 4 seconds for n=200). Two minutes may be a minute and fifty seconds too long, but the operation will complete. Start messing around with O(2^n) or O(n!) algorithms and you could be waiting until you die. Note that I'm not saying O(n^2) is "fast enough" for all purposes. (When n can blow up to hundreds of thousands, it gets to be a very significant nuisance (though it will still complete within your lifetime, and quite possibly within your computer's lifetime even, and if you sic a good-sized cluster on it you can brute force it in days).) What I am saying is that O(n^2) is "fast enough" for situations where brute force will do, but some algorithms are too slow even for that.
In Section
Meditations
|
|