http://www.perlmonks.org?node_id=1037467

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Given a bitstring $vec upto 64MB long, how to quickly locate a run of (at least) N contiguous, unset bits regardless of alignment?

N might be any number from 1 to 100s

My current thinking is that for N > 8, B = int( N /8 ), b = N % 8; Search for B contiguous 0 bytes, then check the preceding and following bytes for at least b adjacent 0 bits.

For N < 8; for each N, only a subset of the 256 bit patterns in a byte have N unset contiguous bits. Ie.

With a lookup table mapping the number of bits required to the bit patterns that can provide them, I have a way to locate these smaller runs.

Except that:

  1. Any one of the run lengths > 2 can be provided by many combinations of 2 adjacent bytes.
  2. The whole process is becoming both decision rich and search heavy.

So then I thought about having 8 more bit strings, where each bit represents a byte in the primary bitstring, one for each of the 8 sets above, where a set bit indicates that this byte in the primary bitstring can provide the requisite number of bits.

And when those bits (in the primary bitstring) are utilised, the one bit is toggled (off) in the secondary bitstring where is was found; and one bit is set in the secondary bitstring that represents the maximum number of unset bits it can now provide.

This reduces the searching to picking the appropriate secondary bitstring and looking for a non-zero byte. (It doesn't yet deal with the adjacent bytes with adjacent, contiguous bits problem.).

It also helps (??) with the large N problem, in that instead of needing to search the primary bitstring for N/8 contiguous 0 bytes; I can search the '7-bits' secondary bitstring for N/8 contiguous bits, potentially cutting the search time to 1/8th.

But of course, the same alignment problem that existed with the primary bit string for which the secondary bitstrings are used to solve, now manifests itself again.

  1. Do I implement a tertiary set of bitstrings (See A.)

Thoughts, comments, better solutions?

(For background and because someone is going to ask though it doesn't affect the problem or possible solutions. The primary bitstring represents freespace in an allocator. (Could be disk or memory.))


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.