the benefit over brute force diminishes toward disappearance as the alphabet gets smaller
As Anonymous Monk so succinctly and accurately put it:
If one were to implement unmodified B-M on bitstrings, the "characters" would be individual bits. The bad character shift table would have two slots, for '0' and for '1'. Practical value of such arrangement is nil.
So yes. The two character alphabet of bit-strings renders Boyer-Moore (and other similar algorithms) ineffective.
But further, the problem is that most of those advocating these algorithms have proposed (and implemented) solutions that treat aligned groups of 8-bits as the alphabet. Ie. Given the needle:
010010110101011110101110
Build the shift table in terms of the three groups of 8-bits: 01001011 01010111 10101110
Giving a table of 3 effective entries and jumps in steps of multiples of 8; which simply fails to work in 7 out of 8 cases.
The solution anonymonk proposes is to artificially extend the alphabet from 2 to 256 by using unaligned groups of 8-bits as he describes in Re^3: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?); which appears to work.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
|