in reply to Re^2: [OT] The interesting problem of comparing bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.

No Boyer-Moore, but you can convert your bitstring search into a regular expression (in the CS sense, not the Perl one), and then compile it into an efficient state-machine.

For instance, in order to look for bitstring 010101, you have to look for character strings matching any of...

s1: 010101xx s2: x010101x s3: xx010101 s4: xxx01010 1xxxxxxx s5: xxxx0101 01xxxxxx s6: xxxxx010 101xxxxx s7: xxxxxx01 0101xxxx s8: xxxxxxx0 10101xxx
You can write the regular expression matching s1 as a character set $s1 = qr/[\x54\x55\x56\x57]/, and similarly for s2-s8, and then combine all of them as qr/$s1|$s2|...|$s8/.

Then you would need some module able to convert that regular expression into a state machine, and use it for matching the vector... I am sure there is already some RegExp backend on CPAN that does that.

The downside is that the resulting state-machines may be huge.

Replies are listed 'Best First'.
Re^4: [OT] The interesting problem of comparing bit-strings. (More info.)
by BrowserUk (Pope) on Mar 24, 2015 at 15:50 UTC

    As you are the third respondent that has suggested bit-masks of one form or another, either:

    1. My question/description is unclear.
    2. Or I'm missing something fundamental.

    Both distinct possibilities.

    As I see it, my problem isn't how to isolate the bits of the strings that I need to compare -- __shiftleft128() does that easily and efficiently.

    The problem is one of how to construct/combine the nested loops in order to perform the iteration of the bits and quads in the proper way.

    Looking up a basic (ie. not the gcc version which incredibly complicated) strstr() function I found this:

    char *strstr( char *haystack, char *needle ) { char *hp; char *np; if( !*needle ) return haystack; // null needle; retur +n haystack (strange POSIX specification requirement!) while( *haystack ) { // until we reach the + end of the haystack hp = haystack; // copy of the haysta +ck pointer np = needle; // copy of the needle + pointer do { if( !*np ) return haystack; // end of needle; mat +ch completed; return current value of haystack pointer } while( *hp++ == *np++ ); // increment both poi +nter (copies) while the chars match ++haystack; // Got here means a m +ismatch; base pointer to next character. } return 0; // Got here means we +hit the terminating null of the haystack; no match; return null. }

    If my bitstrings where always 64-bit aligned, I could just substitute U64 for char everywhere, and that would be that; but of course it isn't.

    Essentially, there are five operations I need to encapsulate:

    1. Incrementing haystack by 1-bit.
    2. Incrementing haystack by 64-bits (1 quad + offset).
    3. Incrementing needle by 64-bits (1 quad + offset).
    4. Detecting EOS of the haystack.
    5. Detecting EOS of the needle.

    For 2 & 3 I thought to write a function:

    U64 nextQuad( U64 **p, U8 o ) { // p: pointer to poin +ter to the 'current' quad; o: value of the unused bits at the beginni +ng. U64 hi = *( ++*p ) // increment the poin +ter at *p, and set hi to the value it points at. , lo = *( *p+1 ); // and lo to the valu +e + 1 it points at return __shiftleft128( lo, hi, o ); // return the value f +rom the higher location with o bits from lo shifted in from the right +. }

    I think that would work, but for efficiency, you'd want to keep hi and lo in registers between calls, and you can't take the address of regiters, so I'd be relying on the compiler to recognise the importance of doing so.

    For one, things get more complicated. Now I need a pointer to the offset as well:

    U64 nextBit( U64 **p, U8 *o ) { // p: pointer to poin +ter to current quad; o: pointer to current offset U64 hi, lo; if( ++( *o ) == 64 ) { // increment the offs +et and detect transitions across quad boundaries. *o = 0; // reset the offset. ++*p; // increment the quad + pointer. } hi = **p; // Get the 'current' + (unshifted) quad value lo = *( *p + 1 ); // And the next return __shiftleft128( lo, hi, *o ); // return the value +from the higher location with *o bits from lo shifted in from the rig +ht. }

    For 4 & 5, I need to calculate whether I've reached the end from the current values of the quad pointer & associated offset.

    Putting it all together and getting it to compile is defeating me at the moment.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Might be that I'm totally wrong, but I think that as long as you're trying to consider

      1. Incrementing haystack by 1-bit.

      an important primary op, there's something wrong.

      I'd basically try to first find out how the needle would need to be shifted in order to match the char_aligned haystack, and leave the haystack as it is - always.

      My somewhat naive approach would be to start with building the set of 256 possible 'needle heads' (going for char operations all over). A 'needle head' could be say 8 chars long (or some other convenient value).

      Then, look for the occurences of the needle_heads within haystack. For each needle_head, you know the offset (needed shift) associated with it. (Yes, that means 256 searches instead of 1, but it still might be better than shift-ing things around for every next bit-position.)

      If needlehead is found, go on with a full, char-wise compare, with haystack's chars untouched and every next needle-char obtained via a proper shift from the next two chars of the needle.

      End of string is bit-masked as needed to comply with the known offset.

      Maybe you've tried something like that already?

      Krambambuli
      ---
        Might be that I'm totally wrong, but I think that as long as you're trying to consider 1. Incrementing haystack by 1-bit. an important primary op, there's something wrong.

        Sorry. I should have pointed out that Incrementing haystack by 1-bit isn't quite what it sounds like.

        I used the expression to match up to the strstr() implementation I posted.

        When in that algorithm they increment the haystack pointer (char*), so as to compare the needle at the next byte offset, I need to effectively increment the haystack by 1 bit.

        What that actually entails is a single machine code instruction that shifts 1 64-bit register left by one bit whilst bringing in a new bit from another 64-bit register containing the (preloaded) next (to the right) quad from the haystack.

        That's the __shiftleft128() intrinsic which is implemented as a single shld instruction.

        But, I also need to detect when I transit a 64-bit boundary so that I can preload the next quad, and reset the shift counter. The whole thing looks like this (very close to what I had above):

        U64 nextBit( U64 **p, U8 *o ) { // p: pointer to poin +ter to current quad; o: pointer to current offset if( ++( *o ) == 64 ) { // increment the offs +et and detect transitions across quad boundaries. *o = 0; // reset the offset. ++*p; // increment the quad + pointer. } // Get the 'current' (unshifted) quad value; and the next return __shiftleft128( *( *p + 1 ), **p, *o ); // return t +he value from the higher location with *o bits from lo shifted in fro +m the right. }

        Which, despite that it looks like it does two 64-bit loads from memory every time, once inlined and optimised, it only does one 64-bit load every 64 calls, with the compiler seeing fit to keep the values in registers across calls. In fact, even the shift-left gets optimised away/folded into the same shift left that is used for the haystack in the inner loop.

        It's hard to follow (if your unfamiliar with x64 code), but the shld instruction that should appear at the bottom of the outer loop (just above the label $LN14@bvSearch:, has been folded into the shld that appears near the top of the inner loop, due to the C code h = nextQuad( &hp, oHay );, by virtue of the rather complex addressing mode used for the register loads:

        mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8]

        The relevant part of the optimised assembler output:

        ; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp rcx, r12 jae SHORT $LN5@bvSearch mov r11, rcx sub r11, r9 npad 3 $LL6@bvSearch: ; 42 : hp = hay; // copy of t +he haystack pointer ; 43 : np = ndl; // copy of t +he needle pointer mov r9, r13 $LL4@bvSearch: ; 44 : //TAG; ; 45 : do { ; 46 : // end of needle; match completed; return current + value of haystack pointer ; 47 : if( np >= eNdl ) { cmp r9, rdi jae SHORT $LN19@bvSearch ; 55 : } ; 56 : h = nextQuad( &hp, oHay ); // get the n +ext quad from haystack mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8] ; 57 : n = nextQuad( &np, oNdl ); // get the n +ext quad from needle mov rdx, QWORD PTR [r9+8] add r9, 8 movzx ecx, sil shld r8, rax, cl mov rax, QWORD PTR [r9+8] movzx ecx, bpl shld rdx, rax, cl ; 58 : //printf( "\rhp:%p->h:%16I64x[%s] np:%p->n:%16I64x[%s]\t", hp +, h, toBin( h ), np, n, toBin( n ) ); ; 59 : } while( h == n ); // while the + quads match cmp r8, rdx je SHORT $LL4@bvSearch ; 60 : nextBit( &hay, &oHayCopy ); // Got here +means a mismatch; get the next bit from the haystack inc bl cmp bl, 64 ; 00000040H jne SHORT $LN14@bvSearch xor bl, bl add r10, 8 add r11, 8 ; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp rcx, r12 jae SHORT $LN5@bvSearch mov r11, rcx sub r11, r9 npad 3 $LL6@bvSearch: ; 42 : hp = hay; // copy of t +he haystack pointer ; 43 : np = ndl; // copy of t +he needle pointer mov r9, r13 $LL4@bvSearch: ; 44 : //TAG; ; 45 : do { ; 46 : // end of needle; match completed; return current + value of haystack pointer ; 47 : if( np >= eNdl ) { cmp r9, rdi jae SHORT $LN19@bvSearch ; 55 : } ; 56 : h = nextQuad( &hp, oHay ); // get the n +ext quad from haystack mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8] ; 57 : n = nextQuad( &np, oNdl ); // get the n +ext quad from needle mov rdx, QWORD PTR [r9+8] add r9, 8 movzx ecx, sil shld r8, rax, cl mov rax, QWORD PTR [r9+8] movzx ecx, bpl shld rdx, rax, cl ; 58 : //printf( "\rhp:%p->h:%16I64x[%s] np:%p->n:%16I64x[%s]\t", hp +, h, toBin( h ), np, n, toBin( n ) ); ; 59 : } while( h == n ); // while the + quads match cmp r8, rdx je SHORT $LL4@bvSearch ; 60 : nextBit( &hay, &oHayCopy ); // Got here +means a mismatch; get the next bit from the haystack inc bl cmp bl, 64 ; 00000040H jne SHORT $LN14@bvSearch xor bl, bl add r10, 8 add r11, 8 $LN14@bvSearch: ; 40 : ; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp r10, r12 jb SHORT $LL6@bvSearch $LN5@bvSearch: ; 61 : }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        codeb

        Ditto :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      My turn to wonder if I'm missing something fundamental or failing to communicate my solution.

      I guess I'm just going to have to find time to code up a working example.

      Either I'll prove my concept or figure out which kind of left field I'm standing in.  :-)

        Please see Re^6: [OT] The interesting problem of comparing bit-strings. for the problems of finding bitstring needles that cross byte boundaries using byte-wise operations.

        In brief, you need to perform two (different) byte compares using two different byte masks at each of 7 different offsets (with different masks and comparison values for each) for every byte in the haystack in order to locate a single byte needle.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^4: [OT] The interesting problem of comparing bit-strings.
by Anonymous Monk on Mar 24, 2015 at 17:20 UTC

    DFA space requirement and construction are both O(m*s) where m is needle length and s the alphabet size. Search is O(n) (linear to text size). In practice, one would handle say 8 bits (a byte) at a time. Even then, the space requirement hardly matters with small patterns..

      The think is that O(m*s) can actually be huge!

      For a 256 alphabet and supposing that a pointer per entry is enough, it results in 1KB per needly byte. Three orders of magnitude bigger than the needle size.

      Admittedly, that is a worst case scenario as in most cases, the transition tables collapse into just a few exits per state and can be represented in more efficient ways. But I don't know if the OP case fits into that "most cases".