Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Evolving a faster filter?

by BrowserUk (Pope)
on Jan 06, 2013 at 15:29 UTC ( #1011890=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Evolving a faster filter?
in thread Evolving a faster filter?

For one run, I might want to exclude all objects of property X, with a value of Y below a particular threshold and which wasn't previously selected in the past hour. ... So multiple filter runs with the same set of objects have, literally, millions of possible different outcomes.

Hm. Sounds like one of these 'community driven' "We also recommend..." things that the world+dog have added to their sites recently.

But still it makes me wonder whether you cannot distribute the load somehow.

That is, is it really necessary to make the entire decision process actively every time, by running all the filters exactly at the instant of need?

Or, could you re-run each of the filters (say) once per hour with the then current dataset and only amalgamate the results and make your final selection at that need point.

You might (for example), run each filter individually and store its result in the form of a bitstring where each position in the bitstring represents a single object in the set. Then, at the time-of-need, you combine (bitwise-AND) the latest individual bitstrings from all the filters to produce the final selection.

With 100,000 objects, a single filter is represented by a 25k scalar. Times (say) 100 filters and it requires 2.5MB to store the current and ongoing filter set.

Combining those 100x 100,000 filter sets is very fast:

use Math::Random::MT qw[ rand ];; $filters[ $_ ] = pack 'Q*', map int( 2**64 * rand() ), 0 .. 1562 for 0 + .. 99;; say time(); $mask = chr(0xff) x 12504; $mask &= $filters[ $_ ] for 0 . +. 99; say time();; 1357485694.21419 1357485694.21907

Less than 5 milliseconds!

Assuming your application could live with the filters being run once every time period (say; once per hour or half an hour or whatever works for your workloads), rather than in-full for every time-of-need?

(NOTE: this is not the method used in my other post which does the full 100,000 objects through 100 filters in 0.76 seconds, but without a feel for how long your current runs take, there is no way to assess how realistic that would be?)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^3: Evolving a faster filter?
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1011890]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-11-27 20:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (188 votes), past polls