No code, no benchmarks. This seems to be a purely theoretical paper from what little i can gather (pretty much gave up reading it).
So we basically don't know if this algorithm is actually faster on modern processor architectures than the old one and under which circumstances. In most algos these days, you have to optimize for a time/space tradeoff. For example:
- A sort algorithm might be faster if it takes more steps, but keeps the data it is currently working on in L1 cache.
- A database takes lots of extra processing steps to generate an index of a table and keeps it up to date in RAM. But for searching, it can quickly-ish scan a few Gigabytes in RAM instead of reading Terrabytes of data from a spinning disk.
- GPU data processing is weird. if-then-else statements get turned into complicated mathematical operations (slow, special compiler) so the GPU can iterate over multiple values at once (SIMD), and ideally processing the whole dataset in a single pass without jumps and conditionals.
- On older 8/16 bit processors, developers used (big for the time) lookup tables for some mathematical operations like sin() or sqrt() whenever they could spare the memory, because it was faster to look up the result than to calculate it. On modern processors, it's very much the opposite.
Without testing with real world hardware and data, it's hard, if not nearly impossible, to tell how well an algorithm will perform and what tradeoffs can be made under which circumstances to optimize it.
| [reply] |