http://www.perlmonks.org?node_id=951726


in reply to Re^4: how to merge many files of sorted hashes?
in thread how to merge many files of sorted hashes?

now i'm more debating on whether i need to incorporate this into a database
It's a hard call. You may also just want to try replacing "copy key+value to output file" to "INSERT INTO table VALUES(?, ?...)",key,values... and see what the relative speed / memory usage is like.

Main things to know about disks:

Main things to know about memory:

I'd say understanding Locality, i.e., trying to keep the stuff you're working on at any given time in close proximity in memory, is about 70% of Computer Science, and these days is generally most of the battle in getting something to run fast once you've settled onto a reasonable algorithm.

Replies are listed 'Best First'.
Re^6: how to merge many files of sorted hashes?
by andromedia33 (Novice) on Feb 03, 2012 at 23:54 UTC
    Thank you so much for the detailed explanation, Wrog. I do think the learning curve is steep if I'd like to implement a database for my particular research problem, given my current standing. however, i'd definitely make an effort to understand basics like how disks and memory work as i work along my projects, which i now feel is indispensable. the scales of my previous work just never blow up like this to remind me just how poorly my algorithms were written.
    as for postgis, i have not taken a close look at it yet, and i'm not entirely sure that the database design permits the kind of flexibility i need for my actual application; i have many more attributes other than coordinates of the points that i need to take care of, so maybe coding it myself will actually be easier and faster (to meet my project deadline).
    i just tried the method you suggested first and it went orders of magnitudes faster (as expected). i used this 100 point cloud to test the upper bound of the running time for all the point clouds, and with this largest one finishing within a day my calculations are now much more scalable. thank you so much!