Re^4: Out of Memory when generating large matrix

Replies are listed 'Best First'.
Re^5: Out of Memory when generating large matrix by Anonymous Monk on Mar 06, 2018 at 16:48 UTC
Hashing in a nutshell: apply hash function f() to the keys, bucket the data records accordingly. Where a radix sort would use part of the key directly (like a hash function that just masks bits), hashing picks a more complicated function. So there's a tradeoff. Your data is no longer sorted by the key, but by f(key). On the other hand, you get a flat distribution that makes the bucketing work. Can you truly not see the similarity between distribution sort and hashing?	[reply]
Re^6: Out of Memory when generating large matrix by BrowserUk (Patriarch) on Mar 06, 2018 at 18:06 UTC
Once you move outside of academia and thesis, it isn't the algorithm, but the implementation that is important. A mergesort programmed badly can be much slower than a bubble sort done well. And once you recognise that in the real world, implementation is king, any kind of disk based sort is glacial compared to a memory-based hash. It isn't the similarities, but the differences that are important. A stately home and a plane both have wings, windows and seats, but the differences outweigh those similarities for most practical considerations. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit	[reply]
Re^7: Out of Memory when generating large matrix by Marshall (Canon) on Mar 07, 2018 at 20:01 UTC
I liked this post from BrowserUK and up-voted it. Implementation is indeed "king"! One problem with theoritcal "O-n" notation is "how expensive is an O?" I remember one of my first programming assignments on 1960's hardware. We were using wire-wrap technolgy for H/W prototypes. The basic software task was to sort thousands of punch cards and produce an output. We had a port of our mainframe code that would run on our lab machine. But it took 6 hours to run! It used the minimum number of compares between card images, but it was very,very slow. Using a bi-directional indexed bubble sort and a fancy merge, I was able to reduce the time from 6 hours to 5 seconds! That doesn't seem possible, but it was possible. These ancient machines with 24K words of memory were slow. My coffee pot probably has a faster processor albiet with not as much memory?! I understood the problem very well. My code had no O/S or file system. Essentially, I wrote it on the "bare metal". Yes, this was a "one trick pony", but it could do its trick very, very well. I could calculate partial results as the punch cards were read in, while still allowing the card reader to run at full speed. On the output, I could calcuate results fast enough so that the ancient shuttle line printer ran at a maximum rate. The 5 second number is the "dead time" when no I/O is happening at the max rate.	[reply]
Re^8: Out of Memory when generating large matrix by BrowserUk (Patriarch) on Mar 08, 2018 at 02:43 UTC
Re^8: Out of Memory when generating large matrix by stevieb (Canon) on Mar 07, 2018 at 20:38 UTC
Re^9: Out of Memory when generating large matrix by Marshall (Canon) on Mar 07, 2018 at 23:57 UTC


We don't bite newbies here... much
	PerlMonks