laziness, impatience, and hubris | |
PerlMonks |
Re^3: [OT] The interesting problem of comparing bit-strings.by sundialsvc4 (Abbot) |
on Mar 24, 2015 at 23:59 UTC ( [id://1121237]=note: print w/replies, xml ) | Need Help?? |
Ahhh.... the bitstrings are very large? How about the typical length of the patterns for which you search within such strings? If the length of the string were longer than 3 * 64 = 192 bits long, and especially if it were longer, then you would have a definite “crib,” in the form of a 64-bit quantity ... there are exactly 64 possibilities ... that must occur in the pattern at a quadword boundary. This would be a “huge win™,” well worth the extra effort of creating a special-case for, in anticipation that it would actually pay-off most of the time. I-F the string being sought was at least 192 bits long, then you are able to calculate, with certainty, the 64 possible values that exist for “the 64-bit ‘word in the middle.’” You can now search for each of these in turn, using an extremely efficient REPNE SCASD (i86 ...) assembler-language construct, and then explore the surrounding doublewords in each case to determine if the bit-string occurs. (If the run being sought is “even longer,” then a REPNE CMPSD loop can be applied to check all of the “subsequent doublewords,” the values of all of which once again can be determined with certainty. Only the partial-doublewords at the beginning and at the end, if any, remain to be verified.) Easily the most efficient implementation would be to loop through all 64 of the pre-computed possibilities, using the assembler instructions aforementioned to locate all occurrences within the buffer. (Of course, and as you of all Monks well know, you will need to use a “sliding window” scan through the compass of the entire file, to avoid missing any bitstrings that serependitiously fell across a boundary, but, I digress.)
Each and every time you stumble-upon any of these 64 values, the odds that you have, in fact, “found another answer” are rather-astronomically high.
In fact, in such a case you might even wish to “take a calculated risk” that in fact you need look no further. If you actually discover a matching 64-bit quantity within the buffer, then the odds are “64 in 2^64” that the entire value-being-sought is present. Certainly, if the REPNE CMPSD succeeds with regard to a location found by REPNE SCASD, then the odds of being wrong become absurdly (and dismissably) small: “It is a match. It has to be.” At this point, the actual odds of committing a “type-2 error” are . . . No, this is not a solution to your entire problem: if the bit-string being sought turns out to be shorter, it will not help at all. But, for longer strings, it will cause the problem to “virtually disappear.”
In Section
Seekers of Perl Wisdom
|
|