http://www.perlmonks.org?node_id=1037748


in reply to Re^5: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
in thread Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

Unfortunately, the bitpatterns you generate -- in common with those used by grimey's code -- are not compatible with vec.

By way of example, this generates a vector with a single length 1x0 run, and a single length 2x0 run; and single length 3x0 run; and so on:

$vec = ''; $n=-1; vec( $vec, $n+=$_, 1 ) = 1 for 2 .. 17; print unpack 'b*', $vec;; 0100100010000100000100000010000000100000000100000000010000000000100000 +000000100000000000010000000000000100000000000000100000000000000010000 +0000000000001000... # 2 5 9 14 20 27 35 44 54 65 + 77 90 104 119 135 + 152

Thus, a search for

Running your code looking for these runs against that vector produces:

wrog 1: 0 wrog 2: 14 wrog 3: 5 wrog 4: 14 wrog 5: 14 wrog 6: 20 wrog 7: 27 wrog 8: 35 wrog 9: 44 wrog 10: 54 wrog 11: 65 wrog 12: 77 wrog 13: 90 wrog 14: 90 wrog 15: 90 wrog 16: 90 wrog 17: 90

I believe this is because little-endian bytes are bytes swapped and word swapped relative to big-endian.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^7: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by wrog (Friar) on Jun 08, 2013 at 02:52 UTC
    First of all, my code doesn't do $N < 8

    More explicitly, neither my nor grimey's versions handles the cases where the run of zeroes occurs entirely inside one byte,

    which in this example happens for $N in 1..4,

    so you'd only expect to see correct answers from 5 on down.

    Secondly, I'm getting the correct numbers for 14..17 when I run it so I'm guessing there's something else going on with either your testbed or mine

      First of all, my code doesn't do $N < 8 ... so you'd only expect to see correct answers from 5 on down.

      Obvious. (Once someone kindly points it out.)

      I'm getting the correct numbers for 14..17 when I run it so I'm guessing there's something else going on with either your testbed or mine

      Indeed, mine! (A dumb typo.)

      C:\test>b-zeroBits.pl -N=1e5 Name "main::N" used only once: possible typo at C:\test\b-zeroBits.pl +line 1. 0100100010000100000100000010000000100000000100000000010000000000100000 +00000010000000 wrog: 1 : 0 wrog: 2 : 14 wrog: 3 : 5 wrog: 4 : 14 wrog: 5 : 14 wrog: 6 : 20 wrog: 7 : 27 wrog: 8 : 35 wrog: 9 : 44 wrog: 10 : 54 wrog: 11 : 65 wrog: 12 : 77 wrog: 13 : 90 wrog: 14 : 104 wrog: 15 : 119 wrog: 16 : 135 wrog: 17 : 152

      My only excuse is attempting to give timely response to your efforts, whilst also construct a descent testbed and benchmark for 4 different approaches (3 posted + my own.). My own attempt -- essentially based on the ideas in the OP -- is proving to be a pain to code.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        ok, well, here's a version that does all $N. The one caveat is it doesn't necessarily return the first match, the main problem with the (or at least my) regexp approach being that when you have a bunch of '|' alternatives and more than one matches, you can't control which one gets used.

        Anyway, here's the code:

        use strict; use warnings; no warnings 'redefine'; # $cc[$n]->[$k] is a character class matching a byte with # a run of zero bits of length at least $k and starting # exactly $n bits from the end our @cc = map { my $n = $_; [ ($n==0 ? () : $n==8 ? ('') : (undef)), map { # $m = first bit after the start of the run # that does not have to be zero my $m = $_+8-$n; local $" = ''; qr{[@{[map { quotemeta(chr(0x80>>$n|$_<<$m)) . ($n<7 ? '-' . quotemeta(chr(0xFF>>$n|$_<<$m)) : '') } 0..(0xFF>>$m)]}]} } ($n==0 ? 0 : 1)..$n ] } 0..8; $cc[0]->[0] = qr{^|$cc[0]->[0]"}; sub zero_bit_regexp { my $N = shift; my $p = join '|', map { $N<=$_ ? "$cc[$_]->[$N]()" : "$cc[$_]->[$_](\\0{@{[($N-$_)>>3]}}$cc[8]->[($N-$ +_)&0x7])" } 0..7; return qr{$p}; } sub match_0s_w { my $N = shift; # number of consecutive '0' bits to be mat +ched local $_ = shift || $_; # string to match against return ($_ !~ zero_bit_regexp($N)) ? -1 : $-[$#-]*8 - $#- + 1 }
        which gives
        1: 0 2: 5 3: 5 4: 14 5: 14 6: 20 7: 27 8: 35 9: 44 10: 54 11: 65 12: 77 13: 90 14: 119 15: 119 16: 135 17: -1
        where the results for 2, 4, and 14 are technically correct but unexpected