|Problems? Is your data what you think it is?|
Here is a "best ordering" optimizer that is just my previous trivial code with two short-circuits added to it.
First, if we've managed to come up with an ordering that costs 18.1 units already (our best ordering so far), and our current calculations only get through the first 9 filters and have already added up 18.2 units of cost, then there's no point permuting any of the remaining filters (nor measuring any further using that starting arrangement).
Second, if my next permutation is going to place a filter with a cost of 5 and a selectivity of .9 in front of a filter with a cost of 4 and a selectivity of .8, then I would just be wasting time to do anything with that permutation. So this code does some efficient checks to prevent at least a lot of cases of putting one filter in front of another filter that it is completely worse than (neither cost nor selectivity improved). Well, assuming the filters start out sorted reasonably, anyway.
This code starts out by naively sorting the filters by the product of cost and selectivity. That gets you pretty close to the ideal solution in the few cases I dreamed up so far. It also ensures that no filter starts out in front of a filter that is completely better than it, which is required for the second short-circuit above to be very effective.
I don't have time right now to explain this code or even test it more than the little bit that I have. NextPermuteNum() in this code is just a modification of the one from Algorithm::Loops.
The usage is to give a list of filter performance values on the command line. Each performance value is an integer cost followed by a decimal selectivity. So "4.9" indicates a filter that eliminates 10% of items (keeps 0.9) and costs 4 units (of CPU).
So a sample run might look like:
Which shows the first stab at a solution and the final solution were:
And it took only 0.1 seconds to find the optimal solution (but then took 3.75 seconds to verify that there was nothing better). An exhaustive search would have taken minutes without those short-circuits.
It still isn't practical for optimally ordering 30 or 40 filters; the run time would depend greatly on how many filters can be declared "completely better than" how many others but could easily run into the "millennia" range for such large lists.
It can probably be most useful for improving the weighting formula used for the initial sort once you have some typical numbers for your situation. $cost*$selectivity isn't a bad starting point, but I'm sure, especially given a limited pattern of values, one can quickly come up with something better with a bit of trial and error.
If the sort gets the first several filters in the right order, then an optimal solution can be had very quickly. A good ordering can be had immediately and better orderings can pop out as you let the thing run.
It may also be useful as something to further modify to get to something that can get even closer to the optimal total cost rather quickly. For example, if you identify a subset of the filters that are likely to be the best ones, then you can separately optimize just that subset first to get a better first part of the ordering.
Anyway, here is the code (which is not pretty nor modular but appears to work reasonably well in the few test cases I played with).