I don't think that I understand the question. Help me out...because this just seems too simple. Successive table lookups that are combined together  no input vector bit is used more than once.
I will proffer that O(n) notation may not be the best for describing what is efficient or not. In an abstract sense, the number of operations is paramount. In an implementation, how fast/slow each operation is also matters. Sometimes using more really fast operations is "better" than fewer slow operations. Re: bitvector > global minimum sounded pretty good to me. It appears to me that every bit in the bit vector will need to be examined. If this is not done bit by bit, then why not do it in groups of bits as a lookup table? A binary search tree that takes into account all possible variations of a really, really long vector could be huge!
If I do this 8 bits at a time, then there is a table of 512 possible
results and I think the result can be calculated one lookup at time. I am thinking that a static table would suffice. A dynamic table would perhaps be better (shorter)? In reply to Re^5: bitvector > global minimum
