Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^4: Out of Memory when generating large matrix

by BrowserUk (Patriarch)
on Mar 06, 2018 at 15:42 UTC ( [id://1210410]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Out of Memory when generating large matrix
in thread Out of Memory when generating large matrix

It is important to understand that hashing is not algorithmically superior to sorting, indeed it is a specific form of sorting in disguise.

What utter baloney! Come out from your cloak and let's argue?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
  • Comment on Re^4: Out of Memory when generating large matrix

Replies are listed 'Best First'.
Re^5: Out of Memory when generating large matrix
by Anonymous Monk on Mar 06, 2018 at 16:48 UTC

    Hashing in a nutshell: apply hash function f() to the keys, bucket the data records accordingly. Where a radix sort would use part of the key directly (like a hash function that just masks bits), hashing picks a more complicated function. So there's a tradeoff. Your data is no longer sorted by the key, but by f(key). On the other hand, you get a flat distribution that makes the bucketing work.

    Can you truly not see the similarity between distribution sort and hashing?

      Once you move outside of academia and thesis, it isn't the algorithm, but the implementation that is important. A mergesort programmed badly can be much slower than a bubble sort done well.

      And once you recognise that in the real world, implementation is king, any kind of disk based sort is glacial compared to a memory-based hash.

      It isn't the similarities, but the differences that are important.

      A stately home and a plane both have wings, windows and seats, but the differences outweigh those similarities for most practical considerations.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        I liked this post from BrowserUK and up-voted it.

        Implementation is indeed "king"!
        One problem with theoritcal "O-n" notation is "how expensive is an O?"

        I remember one of my first programming assignments on 1960's hardware.
        We were using wire-wrap technolgy for H/W prototypes. The basic software task was to sort thousands of punch cards and produce an output.

        We had a port of our mainframe code that would run on our lab machine.
        But it took 6 hours to run!
        It used the minimum number of compares between card images, but it was very,very slow.

        Using a bi-directional indexed bubble sort and a fancy merge, I was able to reduce the time from 6 hours to 5 seconds!

        That doesn't seem possible, but it was possible.
        These ancient machines with 24K words of memory were slow. My coffee pot probably has a faster processor albiet with not as much memory?!

        I understood the problem very well.
        My code had no O/S or file system.
        Essentially, I wrote it on the "bare metal".
        Yes, this was a "one trick pony", but it could do its trick very, very well.
        I could calculate partial results as the punch cards were read in, while still allowing the card reader to run at full speed.
        On the output, I could calcuate results fast enough so that the ancient shuttle line printer ran at a maximum rate.
        The 5 second number is the "dead time" when no I/O is happening at the max rate.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1210410]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-19 16:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found