Ahhh....   the bitstrings are very large?   How about the typical length of the patterns for which you search within such strings?

If the length of the string were longer than 3 * 64 = 192 bits long, and especially if it were longer, then you would have a definite “crib,” in the form of a 64-bit quantity ... there are exactly 64 possibilities ... that must occur in the pattern at a quadword boundary.   This would be a “huge win™,” well worth the extra effort of creating a special-case for, in anticipation that it would actually pay-off most of the time.

I-F the string being sought was at least 192 bits long, then you are able to calculate, with certainty, the 64 possible values that exist for “the 64-bit ‘word in the middle.’”   You can now search for each of these in turn, using an extremely efficient REPNE SCASD (i86 ...) assembler-language construct, and then explore the surrounding doublewords in each case to determine if the bit-string occurs.   (If the run being sought is “even longer,” then a REPNE CMPSD loop can be applied to check all of the “subsequent doublewords,” the values of all of which once again can be determined with certainty.   Only the partial-doublewords at the beginning and at the end, if any, remain to be verified.)

Easily the most efficient implementation would be to loop through all 64 of the pre-computed possibilities, using the assembler instructions aforementioned to locate all occurrences within the buffer.   (Of course, and as you of all Monks well know, you will need to use a “sliding window” scan through the compass of the entire file, to avoid missing any bitstrings that serependitiously fell across a boundary, but, I digress.)

Each and every time you stumble-upon any of these 64 values, the odds that you have, in fact, “found another answer” are rather-astronomically high.   In fact, in such a case you might even wish to “take a calculated risk” that in fact you need look no further.   If you actually discover a matching 64-bit quantity within the buffer, then the odds are “64 in 2^64” that the entire value-being-sought is present.   Certainly, if the REPNE CMPSD succeeds with regard to a location found by REPNE SCASD, then the odds of being wrong become absurdly (and dismissably) small:   “It is a match.   It has to be.”   At this point, the actual odds of committing a “type-2 error” are . . . zero.

No, this is not a solution to your entire problem:   if the bit-string being sought turns out to be shorter, it will not help at all.   But, for longer strings, it will cause the problem to “virtually disappear.”


In reply to Re^3: [OT] The interesting problem of comparing bit-strings. by sundialsvc4
in thread [OT] The interesting problem of comparing bit-strings. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":