"be consistent" PerlMonks

### Re^3: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

by johngg (Abbot)
 on Jun 07, 2013 at 16:48 UTC ( #1037708=note: print w/replies, xml ) Need Help??

Using index looking for contiguous NULL bytes in the vector seems to be much faster.

```use strict;
use warnings;

use 5.014;
use List::Util  qw{ max };
use Time::HiRes qw{ gettimeofday tv_interval };

my \$t0 = [ gettimeofday() ];
srand 1234567;

my \$vec = q{};
vec( \$vec, 536_870_911, 1 ) = 0;
vec( \$vec, \$_         , 1 ) = 1 for 1 .. 21143;
vec( \$vec, int rand 536_870_911, 1 ) = 1 for 1 .. 1e7;

my \$t1 = [ gettimeofday() ];
say qq{Creating vector - @{ [ tv_interval( \$t0, \$t1 ) ] }};

my \$bitStr = unpack q{b*}, \$vec;

my \$t2 = [ gettimeofday() ];
say qq{Unpacking bitstring - @{ [ tv_interval( \$t1, \$t2 ) ] }};

say
qq{Longest contiguous block of zeros is },
max map length, \$bitStr =~ m{(0+)}g,
q{ bits long};

my \$t3 = [ gettimeofday() ];
say qq{Finding longest block - @{ [ tv_interval( \$t2, \$t3 ) ] }};

say qq{\nSearch using regex};
for my \$numZeros ( 25, 10, 78, 3, 943, 307, 5, 599, 19, 345 )
{
my \$ts = [ gettimeofday() ];

say
qq{At least \$numZeros contiguous 0s },
\$bitStr =~ m{(0{\$numZeros,})}
? qq{found at offset \$-[ 0 ], length @{ [ length \$1 ] }}
: q{could not be found};

say qq{     Search took - @{ [ tv_interval( \$ts, [ gettimeofday()
+] ) ] }};
}

say qq{\nSearch using index};
for my \$numZeros ( 25, 10, 78, 3, 943, 307, 5, 599, 19, 345 )
{
my \$ts = [ gettimeofday() ];

if ( \$numZeros < 23 )
{
my \$buffer   = q{};
my \$offset   = -1;
my \$bufStart = 0;
my \$lookFor  = q{0} x \$numZeros;
while ( \$bufStart < length \$vec )
{
\$buffer = unpack q{b*}, substr \$vec, \$bufStart, 131;
do {
say
qq{At least \$numZeros contiguous 0s found at },
\$bufStart * 8 + \$offset;
last;
} unless ( \$offset = index \$buffer, \$lookFor ) == -1;

\$bufStart += 128;
}
say qq{At least \$numZeros contiguous 0s could not be found}
if \$offset == -1;
}
else
{
my \$wholeBytes      = int( ( \$numZeros - 7 ) / 8 );
my \$lookFor         = qq{\0} x \$wholeBytes;
my \$offset          = -1;
my \$zerosToTheLeft  = 0;
my \$zerosToTheRight = 0;

while ( ( \$offset = index \$vec, \$lookFor, \$offset ) > -1 )
{
\$zerosToTheLeft  = zerosToTheLeft( \$offset );
\$zerosToTheRight = zerosToTheRight( \$offset, \$wholeBytes )
+;
last if ( \$wholeBytes * 8 + \$zerosToTheLeft + \$zerosToTheR
+ight )
>= \$numZeros;

\$offset += \$wholeBytes;
}

if ( \$offset == -1 )
{
say qq{At least \$numZeros contiguous 0s could not be found
+};
}
else
{
say
qq{At least \$numZeros contiguous 0s found at },
\$offset * 8 - \$zerosToTheLeft;
}
}

say qq{     Search took - @{ [ tv_interval( \$ts, [ gettimeofday()
+] ) ] }};
}

sub zerosToTheLeft
{
my \$offset = shift;
return 0 unless \$offset;
my \$byteStr = unpack q{b*}, substr \$vec, \$offset - 1, 1;
return 0 unless \$byteStr =~ m{(0+)\$};
return length \$1;
}

sub zerosToTheRight
{
my( \$offset, \$wholeBytes ) = @_;
return 0 if ( \$offset + \$wholeBytes ) == length \$vec;
my \$byteStr = unpack q{b*}, substr \$vec, \$offset + \$wholeBytes, 2;
return 0 unless \$byteStr =~ m{^(0+)};
return length \$1;
}

The output.

```Creating vector - 6.651795
Unpacking bitstring - 2.116776
Longest contiguous block of zeros is 843
Finding longest block - 9.871085

Search using regex
At least 25 contiguous 0s found at offset 21144, length 65
Search took - 1.684111
At least 10 contiguous 0s found at offset 21144, length 65
Search took - 0.737168
At least 78 contiguous 0s found at offset 21302, length 94
Search took - 0.701232
At least 3 contiguous 0s found at offset 21144, length 65
Search took - 0.704558
At least 943 contiguous 0s could not be found
Search took - 12.084963
At least 307 contiguous 0s found at offset 31289, length 343
Search took - 0.658849
At least 5 contiguous 0s found at offset 21144, length 65
Search took - 0.702935
At least 599 contiguous 0s found at offset 5476471, length 625
Search took - 0.822874
At least 19 contiguous 0s found at offset 21144, length 65
Search took - 0.702927
At least 345 contiguous 0s found at offset 70112, length 351
Search took - 0.703438

Search using index
At least 25 contiguous 0s found at 21144
Search took - 6.5e-05
At least 10 contiguous 0s found at 21144
Search took - 0.000149
At least 78 contiguous 0s found at 21302
Search took - 4.1e-05
At least 3 contiguous 0s found at 21144
Search took - 0.00014
At least 943 contiguous 0s could not be found
Search took - 0.959815
At least 307 contiguous 0s found at 31289
Search took - 8e-05
At least 5 contiguous 0s found at 21144
Search took - 0.00015
At least 599 contiguous 0s found at 5476471
Search took - 0.009874
At least 19 contiguous 0s found at 21144
Search took - 0.000154
At least 345 contiguous 0s found at 70112
Search took - 0.000122

This looks a lot more encouraging, I hope it can be adapted for your needs.

Cheers,

JohnGG

• Comment on Re^3: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

Replies are listed 'Best First'.
Re^4: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by BrowserUk (Pope) on Jun 07, 2013 at 20:34 UTC
This looks a lot more encouraging, I hope it can be adapted for your needs.

I wholeheartedly agree. Those timings make this a much more viable approach. Thank you. I will add it to my benchmarking.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://1037708]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2017-08-18 16:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (306 votes). Check out past polls.

Notices?