Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Scaling Hash Limits

by sundialsvc4 (Abbot)
on Sep 19, 2013 at 13:42 UTC ( #1054856=note: print w/replies, xml ) Need Help??

in reply to Scaling Hash Limits

Leveraging CountZero’s comment especially here:   “55 million records” definitely need to be indexed to be useful, but an in-memory hash table (might or ...) might not be the best way to do it, if only because it obliges the entire data structure to be “in memory,” with all of the potential (and, rather unpredictable) impact of virtual-memory paging should the system become memory-constrained.   (Also, and mostly for this reason, it does not scale well ...)

Yes, a hash-table, or tables, probably is the best “in-memory” choice.   The design question is ... is “in-memory” the best choice?   I suggest that it might not be.

If you put these data into any sort of database, the records are, first of all, stored on disk, which has unlimited capacity.   And yet, the data are indexed ... also on-disk.   The database can locate the records-of-interest and then bring those into memory for further processing as needed.   If the database grows to 100 or 1,000 times this number of records ... well, it will just take a little longer, but it won’t fail.   You want to design systems that do not degrade severely (and unexpectedly) as volume increases.   “In-memory” approaches are notorious for “hit the wall and go splat” when memory runs short and paging kicks in.   So, you either need to establish confidence that this won’t happen to you in-production at worst-loads, or design in some other way to mitigate that business risk.

Replies are listed 'Best First'.
Re^2: Scaling Hash Limits
by hdb (Monsignor) on Sep 19, 2013 at 13:51 UTC

    Where can I buy these unlimited capacity disks?

      Where can I buy these unlimited capacity disks?

      Well, not unlimited, but what about 50 petabytes? Except for some custom cases and standard PSUs with custom wirings, you can get that at the next few computer parts stores. Don't forget to buy some racks, too. ;-)


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1054856]
[choroba]: Related to the new release, anyone could explain this or this tester report?
[Discipulus]: hello crew! marto thanks for the message: but I how can I help? i'm testing cpan Padre atm problem with Client::Debug
[choroba]: I don't happen to have 5.10.0 nor 5.8.5 handy...
[Corion]: Hmm - I would say the 5.8.5 is a broken installation / corrupt tarball download, and the 5.10.0 is really weird, and maybe a bug in that version of Perl
[Corion]: I don't see how my $result = eval q{'abc' =~ ?b?}; could create a "Modification of read-only value" error
[marto]: Discipulus the issue that should be adressed is that the page needs to be updated to reflect modern perl on Windows
[Discipulus]: but is really necessary to support these ancient versions? from 5.14 onward is not enough?
[Discipulus]: yes marto I understood
[Corion]: Discipulus: I'm slowly migrating my code to require 5.8.x ;) Most of my code works on 5.6, but Filter::signatures requires 5.10 I think

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2018-06-25 08:44 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.