http://www.perlmonks.org?node_id=1011659


in reply to Evolving a faster filter?

What is the cost of each filter?   In other words, specifically, how much input/output does a filter have to do, in order to make its determination?   And, how much information-in-common do the various related filters share?   And how many other servers, etc., are involved in the process of satisfying each request?

I am taking for granted that each server has enough RAM to allow all n of the objects being considered to be resident in memory at the same time such that no page-fault I/O will be required during the process of handling them.   But I/O is and always will be the primary determinant of execution speed apart from raw CPU horsepower (and of course locking, of which there ought to be no requirement), and I/O can sometimes be well-hid.   Especially significant would be if two filters inadvertently perform the same I/O that a predecessor has already done:   everything should be explicitly cached.

I am also presuming that each one of the servers makes its own decision and does not share information with the others.   But, having said that, I know that it probably is not so:   there probably is some kind of sharing and/or locking and/or distribution of results among the parties, and that necessarily means some amount of waiting, one for the other.   (And the more parties there are to wait, the greater is the chance that any one of them will wait.)

Setting the matter of contention and sharing aside now ...   If the criteria required to predict the performance of a particular filter changes over time, then perhaps you could maintain some kind of dynamic scoring over the filters’s past performance over the last few minutes, say in terms of number of objects eliminated.   The hypothesis is that past performance is the best predictor of future results.   Apply the filters in descending order of score, such that the filters that have lately shown themselves to be most effective are tried first.   If you retain the latest x most-recent score values in a queue and rank the filter by the current average of those scores, you will make it trend.

A final idle thought is then to apply a random-shuffle algorithm to the sorted list, doing, not a complete shuffle, but rather randomly exchanging the positions of some y elements in the ranking each time.   This arbitrarily puts some players on the bench and takes some players off the bench, randomly accepting the chance of a penalty in exchange for the chance that prevailing data-conditions have subtly and recently evolved, such that one player who might otherwise be overlooked just might be the one to score the next touchdown.

I am certain that this thread will attract the attention of BrowserUK, who is well-known about these parts to be especially expert in high-performance algorithms in high-volume situations.   I especially look forward to his insights on this subject.