Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
For one run, I might want to exclude all objects of property X, with a value of Y below a particular threshold and which wasn't previously selected in the past hour. ... So multiple filter runs with the same set of objects have, literally, millions of possible different outcomes.

Hm. Sounds like one of these 'community driven' "We also recommend..." things that the world+dog have added to their sites recently.

But still it makes me wonder whether you cannot distribute the load somehow.

That is, is it really necessary to make the entire decision process actively every time, by running all the filters exactly at the instant of need?

Or, could you re-run each of the filters (say) once per hour with the then current dataset and only amalgamate the results and make your final selection at that need point.

You might (for example), run each filter individually and store its result in the form of a bitstring where each position in the bitstring represents a single object in the set. Then, at the time-of-need, you combine (bitwise-AND) the latest individual bitstrings from all the filters to produce the final selection.

With 100,000 objects, a single filter is represented by a 25k scalar. Times (say) 100 filters and it requires 2.5MB to store the current and ongoing filter set.

Combining those 100x 100,000 filter sets is very fast:

use Math::Random::MT qw[ rand ];; $filters[ $_ ] = pack 'Q*', map int( 2**64 * rand() ), 0 .. 1562 for 0 + .. 99;; say time(); $mask = chr(0xff) x 12504; $mask &= $filters[ $_ ] for 0 . +. 99; say time();; 1357485694.21419 1357485694.21907

Less than 5 milliseconds!

Assuming your application could live with the filters being run once every time period (say; once per hour or half an hour or whatever works for your workloads), rather than in-full for every time-of-need?

(NOTE: this is not the method used in my other post which does the full 100,000 objects through 100 filters in 0.76 seconds, but without a feel for how long your current runs take, there is no way to assess how realistic that would be?)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^3: Evolving a faster filter? by BrowserUk
in thread Evolving a faster filter? by Ovid

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (9)
    As of 2018-03-24 18:40 GMT
    Find Nodes?
      Voting Booth?
      When I think of a mole I think of:

      Results (299 votes). Check out past polls.