Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^6: Out of Memory when generating large matrix

by BrowserUk (Patriarch)
on Mar 06, 2018 at 18:06 UTC ( [id://1210422]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Out of Memory when generating large matrix
in thread Out of Memory when generating large matrix

Once you move outside of academia and thesis, it isn't the algorithm, but the implementation that is important. A mergesort programmed badly can be much slower than a bubble sort done well.

And once you recognise that in the real world, implementation is king, any kind of disk based sort is glacial compared to a memory-based hash.

It isn't the similarities, but the differences that are important.

A stately home and a plane both have wings, windows and seats, but the differences outweigh those similarities for most practical considerations.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
  • Comment on Re^6: Out of Memory when generating large matrix

Replies are listed 'Best First'.
Re^7: Out of Memory when generating large matrix
by Marshall (Canon) on Mar 07, 2018 at 20:01 UTC
    I liked this post from BrowserUK and up-voted it.

    Implementation is indeed "king"!
    One problem with theoritcal "O-n" notation is "how expensive is an O?"

    I remember one of my first programming assignments on 1960's hardware.
    We were using wire-wrap technolgy for H/W prototypes. The basic software task was to sort thousands of punch cards and produce an output.

    We had a port of our mainframe code that would run on our lab machine.
    But it took 6 hours to run!
    It used the minimum number of compares between card images, but it was very,very slow.

    Using a bi-directional indexed bubble sort and a fancy merge, I was able to reduce the time from 6 hours to 5 seconds!

    That doesn't seem possible, but it was possible.
    These ancient machines with 24K words of memory were slow. My coffee pot probably has a faster processor albiet with not as much memory?!

    I understood the problem very well.
    My code had no O/S or file system.
    Essentially, I wrote it on the "bare metal".
    Yes, this was a "one trick pony", but it could do its trick very, very well.
    I could calculate partial results as the punch cards were read in, while still allowing the card reader to run at full speed.
    On the output, I could calcuate results fast enough so that the ancient shuttle line printer ran at a maximum rate.
    The 5 second number is the "dead time" when no I/O is happening at the max rate.

      I have a very similar, before-the-dawn-of-time story -- that I'm sure I've mentioned here before and probably in response to a previous sundial "system sort" solution.

      (From long ago memory, so the details my be fuzzy.) 60 million records sorted on 7(or 9) keys taking 2 weeks on twin PDP-11/60s.

      Reverse the order of the keys reduced the total time to (I think) less than a day.

      The reason: the way the records were stored, the original key order meant doing a seek for every next record, and for almost every sub sort.

      Reversing the keys meant the first pass read the records sequentially. Having grouped records by that key, subsequent subsorts tended to only reorder within a small group of records that tended to be close to each other; hence far less disk/memory cache misses.

      Another big timesaver that happened before the big final mergesort, was to arrange for temporary spill files to be written to "the other" diskpack, to whichever disk pack the file being processed was on. It applied to pretty much every process, and cut most of their run times in half.

      It hard to believe now that in my working lifetime it could have taken a month (before both changes) to sort 60million records. (That was "big data" back then :) )

      It's like something out of a Victorian novel where they describe it taking 3 days from London to Bath and 10 days to York.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      Great, interesting post!

      My coffee pot probably has a faster processor albiet with not as much memory?!

      I took the screwdriver to our coffee pot, but the wife wasn't too pleased that I was going to rip it apart and compare the memory size to that of some of my microcontrollers. By "wasn't too pleased", I mean she grabbed a hammer and said if I proceed, she's heading up to my lab and going to start doing her own "testing" :D

        Oh, geez... I might be upset too! I consider my coffee pot as "life sustaining medical equipment"!

        I guess I didn't mention it in my post, but there was a time when new engineers would get paid to learn new stuff. My boss knew that I'd kick this other program's butt and my program wasn't strictly necessary. The purpose of the assignment was to master interrupt driven device handlers, async I/O and HD, printer, magnetic tape and card reader performance characteristics. I was given other assignments after this one where I needed to apply what I learned from this sort assignment. My boss had a plan and a purpose for this assignment. As a "new guy", I didn't think about what was to come next. I just dove into this thing full bore and did the best that I could.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1210422]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-23 20:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found