http://www.perlmonks.org?node_id=21589


in reply to RE: sieve of erothenes
in thread sieve of erothenes

Okay, so I went back to Abigail's talk, and discovered that he Doesn't actually explain completely how it works... I was at the talk, and apparently he talked and pointed, but the things he said weren't on the slide. So, I'll attempt to describe How this regexp matches prime numbers only... no promises I'll get it all correct though.

First, we build a string:

1 x $_
This is a string of 1's as long as the number you're testing. Then we test that against the regexp:
/^1?$|^(11+?)\1+$/
The first half of the alternation just takes care of matching an empty string (the number 0) and a single 1 (the number 1) which are prime. I'll break out the second half for clarity:
/^(11+?)\1+$/
In the parens, we match 2 or more ones in succession, in a non-greedy fashion. That means, it'll first try the regex matching "11" in the parens, and if the rest of the regex fails, it'll backtrack and try "111" in the parens, and so on. (sounds like a for loop...)

After the parens, it looks for at least one more occurance of \1, which is what it just captured inside the parens. So, it's basically matching for an integral number of "11"'s in the string, and if that fails, it backtracks and tries matching an integral number of "111"'s in the string, and so on. Because of the anchors, there is no room for extra 1's at the end of the string. And since it needs to match the string of ones twice (once in the parens, and at least one more time after that), if the regex succeeds, you know that the number is divisible by a smaller integer, and is therefore not prime.

I hope this helps... since it was already described in public at least once, I don't feel bad giving this particular magician's secret away. I'm sorry I don't know who originally wrote the code; I'm not sure if it's Abigail's or not.

Alan

Replies are listed 'Best First'.
RE: RE: RE: sieve of erothenes
by maverick (Curate) on Jul 10, 2000 at 01:05 UTC
    Very slick! I've never seen it done like this before. We're not using the same algorithm, but for my own morbid curiosity I benchmarked it against mine and ase's code. I added the 'o' modifier to your regex to speed it up a bit.
    #!usr/bin/perl -w use Benchmark; timethese(100,{ Maverick => sub { for (1..1000){ push @_,$_; for (@_[1..$#_-1]){ pop @_ && last if !($_[-1]%$_) } } #print join "\t", @_; }, Ase => sub { $v='1' x 1000; for(2..1000){ next if !vec($v,$_-1,8); for($q=$_*2;$q<1001;$q+=$_){ vec($v,$q-1,8)=0 } } #print join "\t",grep {vec($v,$_-1,8)} (1..100 +0); }, Alan => sub { for(1..1000){ push @_,$_ if (1 x $_) !~ /^1?$|^(11+? +)\1+$/o } #print join "\t", @_; } });
    and ended up with:
    Benchmark: timing 100 iterations of Alan, Ase, Maverick... Alan: 93 wallclock secs (92.95 usr + 0.00 sys = 92.95 CPU) Ase: 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) Maverick: 1 wallclock secs ( 1.65 usr + 0.01 sys = 1.66 CPU)
    You have the shortest but ase still has the fastest. :)

    This started off being about the seive of "the always mispelled guy", but now I'm curious about other ways of doing this. Perhaps on some rainy afternoon I'll go dig up few more interesting snippets to throw at the monk collective...

    /\/\averick