Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

by mr_mischief (Monsignor)
on Apr 06, 2015 at 16:09 UTC ( [id://1122576]=note: print w/replies, xml ) Need Help??


in reply to Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

To clarify, my understanding is that Boyer-Moore should work in that it would produce correct output, but the benefit over brute force diminishes toward disappearance as the alphabet gets smaller. Is that what you're seeing?

  • Comment on Re: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

Replies are listed 'Best First'.
Re^2: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
by BrowserUk (Patriarch) on Apr 06, 2015 at 17:06 UTC
    the benefit over brute force diminishes toward disappearance as the alphabet gets smaller

    As Anonymous Monk so succinctly and accurately put it:

    If one were to implement unmodified B-M on bitstrings, the "characters" would be individual bits. The bad character shift table would have two slots, for '0' and for '1'. Practical value of such arrangement is nil.

    So yes. The two character alphabet of bit-strings renders Boyer-Moore (and other similar algorithms) ineffective.

    But further, the problem is that most of those advocating these algorithms have proposed (and implemented) solutions that treat aligned groups of 8-bits as the alphabet. Ie. Given the needle:

    010010110101011110101110

    Build the shift table in terms of the three groups of 8-bits:

    01001011 01010111 10101110

    Giving a table of 3 effective entries and jumps in steps of multiples of 8; which simply fails to work in 7 out of 8 cases.

    The solution anonymonk proposes is to artificially extend the alphabet from 2 to 256 by using unaligned groups of 8-bits as he describes in Re^3: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?); which appears to work.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^2: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
by Anonymous Monk on Apr 06, 2015 at 17:05 UTC
    Yes, it is implied that the smaller the set of the alphabet (and for that matter the needle) the smaller the shift potential, the smaller the shift potential the less optimized from linear any of those algos is.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1122576]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-20 08:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found