http://www.perlmonks.org?node_id=732737


in reply to Re: Netflix (or on handling large amounts of data efficiently in perl)
in thread Netflix (or on handling large amounts of data efficiently in perl)

I've recently posted a Judy array (http://judy.sourceforge.net) wrapper to CPAN at Judy which has a sparse bit vector implementation, Judy1(3) which is available in Perl at Judy::1.

Updated: I've posted Compact and sparse bit vector which is an example of a perl vs Judy bit vector.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Replies are listed 'Best First'.
Re^3: Netflix (or on handling large amounts of data efficiently in perl)
by Garp (Acolyte) on Dec 29, 2008 at 10:31 UTC

    Let's see if I'm following your reasoning correctly.

    I'm essentially interested in three variables:
    $movieid
    $userid
    $rating

    Are you suggesting that I make a multi-dimensional Judy array of arrays? So for each movie create a Judy array using $userid as the index and $rating as the value, then put that into a Judy array as the value with $movieid as the index?

    Apologies if I'm stating the obvious, I wouldn't classify myself as a programmer.

    From a very, very rough test (not even gone back to confirm availability of data) this is looking very good indeed for memory consumption. Will do some further testing tomorrow

      Sure, why not. tilly originally mentioned a bitmap so I mentioned something cheaper in memory. You can build multi dimensional Judy arrays. In particular, JudyHS is implemented as a nested set of JudyL arrays. I've posted a snippet at Dump JudyHS which demos dumping a JudyHS structure.

      It's explicitly required for this to work that Judy is nestable.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊