in reply to Re^2: [OT] The interesting problem of comparing (long) bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.

Hm? Boyer-Moore prep is the same, O(m) in time and space. A good-shift table of length m is used. It might be more compact in practice.

If the problem is too large, divide it! Code a tuned search routine for fixed length (sub)needles, such that all tables fit into L2. My guess is KB needles can be done. Now the search becomes two-level. Top-level "alphabet" are the subneedles.

As for brute-force. The "Boyer-Moore beating assembler versions that utilize microcoded string search instructions" is a decades-old lesson. Tried to google for some edutaining links, but sadly, google has become ****.

  • Comment on Re^3: [OT] The interesting problem of comparing (long) bit-strings.

Replies are listed 'Best First'.
Re^4: [OT] The interesting problem of comparing (long) bit-strings.
by BrowserUk (Pope) on Mar 29, 2015 at 04:17 UTC
    As for brute-force. The "Boyer-Moore beating assembler versions that utilize microcoded string search instructions" is a decades-old lesson

    For string search, decades ago (before cpus acquired high-speed, on-chip caches) Boyer Moore was the business, but since the 486 circa. 1990, on-chip caches have tipped the balance again.

    "Decades old lessons" were relevant, decades ago.

    If you won't take my words for it; and aren't equipped to try it for yourself, take Bjarne Stroustrup's word for it. View http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style and skip ahead to timecode 44:40.

    A few seconds in he presents a question (on a slide titled "Vector vs. Linked List"), that goes like this:

    The task comes in two parts:

    1. Generate N random integers and insert them into a sequence so that each is inserted in its proper position in numerical order.
    2. Remove the elements one at a time by picking a random position in the sequence and removing the element there.

    And the question is: For which N does it become faster to use a linked list than a flat array?

    The decades old wisdom on this is that insertions/deletions into/from an array are O(N2), but for a linked list they are O(N log N) (with slightly higher constant factors), so N only need grow to a couple of hundred or a 1000 or so for the linked-list to start outperforming the flat array. Right? Wrong!

    Watch for a few more seconds (and wonder in amazement at his non-existent graph) as he shows that the answer is "Never". At no point does the linked list become faster than the flat array. In fact the bigger N gets, the bigger the gap between the two becomes, in favour of the flat array.

    As counter-intuitive as it may be, the benefits of the caching on the sequential accesses made by the flat array, far outweigh the costs of moving more data. The non-sequential, scattered around memory nature of linked-lists means they never catch up, let alone surpass the performance of the cache-friendly contiguous memory accesses of the flat array.

    Times change; and a lot of the "knowledge" CS students acquired, despite being tortured within an inch of their lives by Knuth; is simply out-of-date. Whilst the proofs of correctness still hold; the analysies of operational complexity -- done in terms of the imaginative, but imaginary MIX (later MMIX) -- simply do not consider the effects of caching, pipelining, branch prediction, memory stalls et al.

    Stroustrup concludes that there are "3 orders of magnitude performance" to be gained.

    And, the relevance of string search analysis to bit-string search, has to my knowledge, never been demonstrated. They are quite different animals.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      but for a linked list they are O(N log N)

      No, it is also O(N2).

        Really? With flat array, once you've found the insertion/deletion point, you've to move (ave.) 50% of the array one place to accommodate/close up the array, but only a coupe of pointers to write for the linked list. Just wildly differing constants then.

        (I guess i was think about trees rather than linked lists.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked