I don't think that I understand the question. Help me out...because this just seems too simple. Successive table look-ups that are combined together - no input vector bit is used more than once.
I will proffer that O(n) notation may not be the best
for describing what is efficient or not. In an abstract sense,
the number of operations is paramount. In an implementation, how
fast/slow each operation is also matters. Sometimes using more really
fast operations is "better" than fewer slow operations.
Re: bit-vector > global minimum sounded pretty good to me. It appears to me that every
bit in the bit vector will need to be examined. If this is not done
bit by bit, then why not do it in groups of bits as a look-up table?
A binary search tree that takes into account all possible variations
of a really, really long vector could be huge!
If I do this 8 bits at a time, then there is a table of 512 possible
results and I think the result can be calculated one look-up at time.
01234567
10010010
bits 2,5 are lowest min values, that whole vector '10010010' could be an index
into an array of struct that lists the values of 2,5. I think there needs to
be an extra test of '0' at the end of the 8 bits and a way to propagate
results to the next 8 bits to keep track of the "best" so far. The implementation
may work better if bits are inverted before the table look-up. And I think that
probably one more bit needs to be added to the look-up table to help with a "stitch" of the previous result together with current result is required.
I am thinking that a static table would suffice. A dynamic table would perhaps be better (shorter)?