Hashing in a nutshell: apply hash function f() to the keys, bucket the data records accordingly. Where a radix sort would use part of the key directly (like a hash function that just masks bits), hashing picks a more complicated function. So there's a tradeoff. Your data is no longer sorted by the key, but by f(key). On the other hand, you get a flat distribution that makes the bucketing work.
Can you truly not see the similarity between distribution sort and hashing?
| [reply] |
Once you move outside of academia and thesis, it isn't the algorithm, but the implementation that is important. A mergesort programmed badly can be much slower than a bubble sort done well.
And once you recognise that in the real world, implementation is king, any kind of disk based sort is glacial compared to a memory-based hash.
It isn't the similarities, but the differences that are important.
A stately home and a plane both have wings, windows and seats, but the differences outweigh those similarities for most practical considerations.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] |
I liked this post from BrowserUK and up-voted it.
Implementation is indeed "king"!
One problem with theoritcal "O-n" notation is "how expensive is an O?"
I remember one of my first programming assignments on 1960's hardware. We were using wire-wrap technolgy for H/W prototypes.
The basic software task was to sort thousands of punch cards and produce an output.
We had a port of our mainframe code that would run on our lab machine.
But it took 6 hours to run!
It used the minimum number of compares between card images, but it was very,very slow.
Using a bi-directional indexed bubble sort and a fancy merge, I was able to reduce the time from 6 hours to 5 seconds!
That doesn't seem possible, but it was possible.
These ancient machines with 24K words of memory were slow. My coffee pot probably has a faster processor albiet with not as much memory?!
I understood the problem very well.
My code had no O/S or file system.
Essentially, I wrote it on the "bare metal".
Yes, this was a "one trick pony", but it could do its trick very, very well.
I could calculate partial results as the punch cards were read in, while still allowing the card reader to run at full speed.
On the output, I could calcuate results fast enough so that the ancient shuttle line printer ran at a maximum rate.
The 5 second number is the "dead time" when no I/O is happening at the max rate.
| [reply] |