<?xml version="1.0" encoding="windows-1252"?>
<node id="1011659" title="Re: Evolving a faster filter?" created="2013-01-04 10:58:27" updated="2013-01-04 10:58:27">
<type id="11">
note</type>
<author id="647953">
sundialsvc4</author>
<data>
<field name="doctext">
&lt;p&gt;
What is the &lt;em&gt;cost&lt;/em&gt; of each filter? &amp;nbsp; In other words, specifically, how much &lt;em&gt;input/output&lt;/em&gt; does a filter have to do, in order to make its determination? &amp;nbsp; And, how much information-in-common do the various related filters share? &amp;nbsp; And how many other servers, &lt;i&gt;etc.,&lt;/i&gt; are involved in the process of satisfying each request?
&lt;/p&gt;&lt;p&gt;
I am taking for granted that each server has enough RAM to allow all &lt;i&gt;n&lt;/i&gt; of the objects being considered to be resident in memory at the same time such that no page-fault I/O will be required during the process of handling them. &amp;nbsp; But I/O is and always will be the primary determinant of execution speed apart from raw CPU horsepower (and of course locking, of which there ought to be no requirement), and I/O can sometimes be well-hid. &amp;nbsp; Especially significant would be if two filters inadvertently perform the same I/O that a predecessor has already done: &amp;nbsp; everything should be explicitly cached.
&lt;/p&gt;&lt;p&gt;
I am also presuming that each one of the servers makes its own decision and does not share information with the others. &amp;nbsp; But, having said &lt;em&gt;that,&lt;/em&gt; I know that it probably is not so: &amp;nbsp; there probably &lt;em&gt;is&lt;/em&gt; some kind of sharing and/or locking and/or distribution of results among the parties, and that necessarily means some amount of waiting, one for the other. &amp;nbsp; (And the more parties there are to wait, the greater is the chance that any one of them will wait.)
&lt;/p&gt;&lt;p&gt;
Setting the matter of contention and sharing aside now ... &amp;nbsp;
If the criteria required to predict the performance of a particular filter changes over time, then perhaps you could maintain some kind of dynamic scoring over the filters&amp;rsquo;s past performance over the last few minutes, say in terms of number of objects eliminated. &amp;nbsp; The hypothesis is that past performance is the best predictor of future results. &amp;nbsp; Apply the filters in descending order of score, such that the filters that have lately shown themselves to be most effective are tried first. &amp;nbsp; If you retain the latest &lt;i&gt;x&lt;/i&gt; most-recent score values in a queue and rank the filter by the current average of those scores, you will make it trend.
&lt;/p&gt;&lt;p&gt;
A final idle thought is then to apply a random-shuffle algorithm to the sorted list, doing, not a complete shuffle, but rather randomly exchanging the positions of some &lt;i&gt;y&lt;/i&gt; elements in the ranking each time. &amp;nbsp; This arbitrarily puts some players on the bench and takes some players off the bench, randomly accepting the chance of a penalty in exchange for the chance that prevailing data-conditions have subtly and recently evolved, such that one player who might otherwise be overlooked just might be the one to score the next touchdown.
&lt;/p&gt;&lt;p&gt;
I am certain that this thread will attract the attention of [BrowserUK], who is well-known about these parts to be especially expert in high-performance algorithms in high-volume situations. &amp;nbsp; I especially look forward to his insights on this subject.
&lt;/p&gt;</field>
<field name="root_node">
1011650</field>
<field name="parent_node">
1011650</field>
</data>
</node>
