But when a sensible algorithm can locate 1 million bits close to the end of a 1/4 billion bits in 0.25 seconds, who cares if you have to "pedantically growl" ..... And anything that takes longer than 1/4 of a second is pointless.

Out of curiosity, I tested the gmp library's capability with the above scenario - under the assumption that "1/4 billion" is "250 million".
The haystack was pseudo-randomly generated, and the needle starts 19 bits from the left hand end of the haystack.
I used a combination of Math::GMPz and Inline::C.

It took 3 seconds to find the needle in the haystack (starting from the right hand end of the haystack) on Windows 7 64 bit.

In some parts of the world "1/4 billion" is "25 million" and the program then, of course, finds the needle 10 times quicker.

I'm not at all impressed with the code that I wrote in order to check this. But I was impressed that a general usage library could provide such a speedy implementation of the approach.
Unfortunately, it's not quick enough to be pointful ;-)
use strict; use warnings; use Math::GMPz qw(:mpz); use Time::HiRes; use Inline C => Config => USING => 'ParseRegExp', BUILD_NOISY => 1, INC => '-IC:/MinGW/msys/1.0/local/include', LIBS => '-LC:/MinGW/msys/1.0/local/lib -lgmp', TYPEMAPS => './typemap', # ships with Math::GMPz ; use Inline C => <<'EOC'; #include <gmp.h> int start(mpz_t * haystack, int haystack_size, mpz_t * needle, int needle_size) { int h_bit, n_bit, nless1 = needle_size -1; for(h_bit = nless1; h_bit < haystack_size; h_bit++) { for(n_bit = 0; n_bit < needle_size; n_bit++) { if(mpz_tstbit(*haystack, h_bit - needle_size + n_bit) != mpz_tstbit(needle, n_bit)) break; if(n_bit == nless1) return h_bit; } } return 0; } EOC my ($t1, $t2, $match); my ($haystack_size, $needle_size) = (250e6, 1e6); # generate the haystack - takes a while my $h_string = rstring($haystack_size); # generate a needle that's guaranteed to match , starting # at 19 bits from the lh end - ie at bit 249999981 my $n_string = substr($h_string, 19, $needle_size); # create the haystack gmp vector. (We need an additional # leading "1" bit to ensure that leading zero bits aren't ignored.) my $haystack = Math::GMPz->new( "1" . $h_string , 2); # create the needle gmp vector. (We need an additional # leading "1" bit to ensure that leading zero bits aren't ignored.) my $needle = Math::GMPz->new("1" . $n_string, 2); # Check that we've got the right number of bits. print Rmpz_sizeinbase($haystack, 2), " ", Rmpz_sizeinbase($needle, 2), "\n"; $t1 = Time::HiRes::time(); $match = start($haystack, $haystack_size, $needle, $needle_size); $t2 = Time::HiRes::time(); print $match, " ", $t2 - $t1, "\n"; sub rstring { die "haystack size must be a multiple of 10" if $_[0] % 10; my $ret; my $its = $_[0] / 10; $ret .= sprintf("%010b",int rand 1024) for 1 .. $its; if(length($ret) != $_[0] || $ret =~ /[^01]/) { die "Error in sub rstring"; } return $ret; } # Outputs: # 250000001 1000001 # 249999981 3.16680598258972
Cheers,
Rob

In reply to Re^3: [OT] The interesting problem of comparing bit-strings. by syphilis
in thread [OT] The interesting problem of comparing bit-strings. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":