http://www.perlmonks.org?node_id=1037799


in reply to Re^7: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
in thread Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

First of all, my code doesn't do $N < 8 ... so you'd only expect to see correct answers from 5 on down.

Obvious. (Once someone kindly points it out.)

I'm getting the correct numbers for 14..17 when I run it so I'm guessing there's something else going on with either your testbed or mine

Indeed, mine! (A dumb typo.)

C:\test>b-zeroBits.pl -N=1e5 Name "main::N" used only once: possible typo at C:\test\b-zeroBits.pl +line 1. 0100100010000100000100000010000000100000000100000000010000000000100000 +00000010000000 wrog: 1 : 0 wrog: 2 : 14 wrog: 3 : 5 wrog: 4 : 14 wrog: 5 : 14 wrog: 6 : 20 wrog: 7 : 27 wrog: 8 : 35 wrog: 9 : 44 wrog: 10 : 54 wrog: 11 : 65 wrog: 12 : 77 wrog: 13 : 90 wrog: 14 : 104 wrog: 15 : 119 wrog: 16 : 135 wrog: 17 : 152

My only excuse is attempting to give timely response to your efforts, whilst also construct a descent testbed and benchmark for 4 different approaches (3 posted + my own.). My own attempt -- essentially based on the ideas in the OP -- is proving to be a pain to code.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^8: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
  • Download Code

Replies are listed 'Best First'.
Re^9: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by wrog (Friar) on Jun 08, 2013 at 11:37 UTC
    ok, well, here's a version that does all $N. The one caveat is it doesn't necessarily return the first match, the main problem with the (or at least my) regexp approach being that when you have a bunch of '|' alternatives and more than one matches, you can't control which one gets used.

    Anyway, here's the code:

    use strict; use warnings; no warnings 'redefine'; # $cc[$n]->[$k] is a character class matching a byte with # a run of zero bits of length at least $k and starting # exactly $n bits from the end our @cc = map { my $n = $_; [ ($n==0 ? () : $n==8 ? ('') : (undef)), map { # $m = first bit after the start of the run # that does not have to be zero my $m = $_+8-$n; local $" = ''; qr{[@{[map { quotemeta(chr(0x80>>$n|$_<<$m)) . ($n<7 ? '-' . quotemeta(chr(0xFF>>$n|$_<<$m)) : '') } 0..(0xFF>>$m)]}]} } ($n==0 ? 0 : 1)..$n ] } 0..8; $cc[0]->[0] = qr{^|$cc[0]->[0]"}; sub zero_bit_regexp { my $N = shift; my $p = join '|', map { $N<=$_ ? "$cc[$_]->[$N]()" : "$cc[$_]->[$_](\\0{@{[($N-$_)>>3]}}$cc[8]->[($N-$ +_)&0x7])" } 0..7; return qr{$p}; } sub match_0s_w { my $N = shift; # number of consecutive '0' bits to be mat +ched local $_ = shift || $_; # string to match against return ($_ !~ zero_bit_regexp($N)) ? -1 : $-[$#-]*8 - $#- + 1 }
    which gives
    1: 0 2: 5 3: 5 4: 14 5: 14 6: 20 7: 27 8: 35 9: 44 10: 54 11: 65 12: 77 13: 90 14: 119 15: 119 16: 135 17: -1
    where the results for 2, 4, and 14 are technically correct but unexpected