good chemistry is complicated,and a little bit messy -LW PerlMonks

### Re^2: Compact and sparse bit vector

by BrowserUk (Pope)
 on Jan 03, 2009 at 15:00 UTC ( #733921=note: print w/replies, xml ) Need Help??

in reply to Re: Compact and sparse bit vector
in thread Compact and sparse bit vector

Actually, the main use for bitvectors in the NetFlix challenge. is that you can do very fast relational math.

So performing the select to find all the people who have rated all the same movies (and the movie we are trying to generate a rating for), that we already have a rating for by the person we are trying to generate a rating for, is just a tight loop over the dataset performing bitstring boolean operations which are very fast (in perl).

Doing the same operations (AND/OR/XOR/etc. of every bit of one bitstring against every bit of another bitstring for every customer in the DB) using Judy arrays, would not benefit at all from any of their special properties (cache line locality), and be extremely slow.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Compact and sparse bit vector
by demerphq (Chancellor) on Jan 03, 2009 at 18:01 UTC

Heh, I kind zeroed in on the expression "was a sparse bit vector which might be interesting as a replacement for vec()", and didnt even notice the netflix reference. :-)

---
\$world=~s/war/peace/g

Re^3: Compact and sparse bit vector
by diotalevi (Canon) on Jan 04, 2009 at 09:30 UTC

I've never really found myself doing vector operations of bitmaps against each other. I didn't notice that being a feature of the linked Netflix node. I do occasionally find it convenient to have a sparse bitmask. Usually I just use a hash of the stringified integer.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

There's no implied criticism of Judy arrays, or your module, for the uses for which they are designed.

And indeed, there is no requirement in the NetFlix challenge to use bitvectors at all. There is a need however, to perform some more or less complex relational operations across the three datasets to arrive at the goal of the challenge: To estimate a user rating for given user for a given film (that they haven't yet watched or rated) based upon how others that have watched and rated the given film have rated it.

Whilst this kind of query is relatively trivial to code using SQL, the volumes of raw data, and size of the cross-product of the film & user bases is such that it is time&memory intensive to produce the results. Using bitwise storage for this type of join query is extremely fast. Some literature.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://733921]
help
Chatterbox?
 [Cosmic37]: I read that its possible to use the FANN library with Perl so I might try that now... [corenth]: i have a question. I used map{ blah();blah(); blah();}@stuff; and it used up a ton of memory vs. the for(@stuff){} equivalent. What gives? (if anyone knows) [Cosmic37]: FANN also has LGPL license which I like and its supposed to be quite a capable library from what I read [corenth]: Cosmic, that sounds interesting. What is FANN (I could search it if I weren't so lazy about it)? [Cosmic37]: how big was @stuff corenth? [Cosmic37]: Fast Artificial Neural Network (FANN) is cross-platform open source programming library for developing multilayer feedforward Artificial Neural Networks [corenth]: @stuff was pretty big. I think it grew to about 8000.

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2018-02-20 18:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When it is dark outside I am happiest to see ...

Results (274 votes). Check out past polls.

Notices?