Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^6: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

by BrowserUk (Pope)
on Jun 07, 2013 at 20:23 UTC ( #1037748=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
in thread Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

Unfortunately, the bitpatterns you generate -- in common with those used by grimey's code -- are not compatible with vec.

By way of example, this generates a vector with a single length 1x0 run, and a single length 2x0 run; and single length 3x0 run; and so on:

$vec = ''; $n=-1; vec( $vec, $n+=$_, 1 ) = 1 for 2 .. 17; print unpack 'b*', $vec;; 0100100010000100000100000010000000100000000100000000010000000000100000 +000000100000000000010000000000000100000000000000100000000000000010000 +0000000000001000... # 2 5 9 14 20 27 35 44 54 65 + 77 90 104 119 135 + 152

Thus, a search for

  • 1 x 0 should be found at position 0;
  • 2 x 0 should be found at position 2;
  • 3@5; 4@9; 5@14; ...

Running your code looking for these runs against that vector produces:

wrog 1: 0 wrog 2: 14 wrog 3: 5 wrog 4: 14 wrog 5: 14 wrog 6: 20 wrog 7: 27 wrog 8: 35 wrog 9: 44 wrog 10: 54 wrog 11: 65 wrog 12: 77 wrog 13: 90 wrog 14: 90 wrog 15: 90 wrog 16: 90 wrog 17: 90

I believe this is because little-endian bytes are bytes swapped and word swapped relative to big-endian.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^6: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
Select or Download Code
Re^7: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by wrog (Monk) on Jun 08, 2013 at 02:52 UTC
    First of all, my code doesn't do $N < 8

    More explicitly, neither my nor grimey's versions handles the cases where the run of zeroes occurs entirely inside one byte,

    which in this example happens for $N in 1..4,

    so you'd only expect to see correct answers from 5 on down.

    Secondly, I'm getting the correct numbers for 14..17 when I run it so I'm guessing there's something else going on with either your testbed or mine

      First of all, my code doesn't do $N < 8 ... so you'd only expect to see correct answers from 5 on down.

      Obvious. (Once someone kindly points it out.)

      I'm getting the correct numbers for 14..17 when I run it so I'm guessing there's something else going on with either your testbed or mine

      Indeed, mine! (A dumb typo.)

      C:\test>b-zeroBits.pl -N=1e5 Name "main::N" used only once: possible typo at C:\test\b-zeroBits.pl +line 1. 0100100010000100000100000010000000100000000100000000010000000000100000 +00000010000000 wrog: 1 : 0 wrog: 2 : 14 wrog: 3 : 5 wrog: 4 : 14 wrog: 5 : 14 wrog: 6 : 20 wrog: 7 : 27 wrog: 8 : 35 wrog: 9 : 44 wrog: 10 : 54 wrog: 11 : 65 wrog: 12 : 77 wrog: 13 : 90 wrog: 14 : 104 wrog: 15 : 119 wrog: 16 : 135 wrog: 17 : 152

      My only excuse is attempting to give timely response to your efforts, whilst also construct a descent testbed and benchmark for 4 different approaches (3 posted + my own.). My own attempt -- essentially based on the ideas in the OP -- is proving to be a pain to code.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        ok, well, here's a version that does all $N. The one caveat is it doesn't necessarily return the first match, the main problem with the (or at least my) regexp approach being that when you have a bunch of '|' alternatives and more than one matches, you can't control which one gets used.

        Anyway, here's the code:

        use strict; use warnings; no warnings 'redefine'; # $cc[$n]->[$k] is a character class matching a byte with # a run of zero bits of length at least $k and starting # exactly $n bits from the end our @cc = map { my $n = $_; [ ($n==0 ? () : $n==8 ? ('') : (undef)), map { # $m = first bit after the start of the run # that does not have to be zero my $m = $_+8-$n; local $" = ''; qr{[@{[map { quotemeta(chr(0x80>>$n|$_<<$m)) . ($n<7 ? '-' . quotemeta(chr(0xFF>>$n|$_<<$m)) : '') } 0..(0xFF>>$m)]}]} } ($n==0 ? 0 : 1)..$n ] } 0..8; $cc[0]->[0] = qr{^|$cc[0]->[0]"}; sub zero_bit_regexp { my $N = shift; my $p = join '|', map { $N<=$_ ? "$cc[$_]->[$N]()" : "$cc[$_]->[$_](\\0{@{[($N-$_)>>3]}}$cc[8]->[($N-$ +_)&0x7])" } 0..7; return qr{$p}; } sub match_0s_w { my $N = shift; # number of consecutive '0' bits to be mat +ched local $_ = shift || $_; # string to match against return ($_ !~ zero_bit_regexp($N)) ? -1 : $-[$#-]*8 - $#- + 1 }
        which gives
        1: 0 2: 5 3: 5 4: 14 5: 14 6: 20 7: 27 8: 35 9: 44 10: 54 11: 65 12: 77 13: 90 14: 119 15: 119 16: 135 17: -1
        where the results for 2, 4, and 14 are technically correct but unexpected

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1037748]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2014-07-26 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (176 votes), past polls