Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Math help: Finding Prime Numbers

by Ovid (Cardinal)
on Nov 12, 2006 at 20:04 UTC ( #583585=perlquestion: print w/ replies, xml ) Need Help??
Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Recently for a project I started, I needed to find prime numbers. I'm not particularly worried about large prime numbers, but if someone asks for them, this module can be really slow. With caching, it speeds up quite a bit, but can still be slow the first time it hits a large number.

package Math::Name::To::Be::Determined; use warnings; use strict; our $VERSION = '0.01'; use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS); use Exporter; @ISA = 'Exporter'; @EXPORT_OK = qw( clear_prime_cache is_prime primes_upto ); %EXPORT_TAGS = ( all => \@EXPORT_OK, ); { my ( @PRIMES, %IS_PRIME ); clear_prime_cache(); sub is_prime { my $number = shift; return 1 if $IS_PRIME{$number}; return if $number < $PRIMES[-1]; my $is_prime; for ( $PRIMES[-1] + 1 .. $number ) { # cache prime numbers if ( $is_prime = _is_prime($_) ) { push @PRIMES => $_; } } return $is_prime; } sub _is_prime { my $number = shift; return unless _is_integer($number); return 1 if $IS_PRIME{$number}; return unless $number > 2 and $number % 2; my $max = 1 + int $number**.5; for ( my $divisor = 3; $divisor < $max; $divisor += 2 ) { return unless $number % $divisor; } $IS_PRIME{$number} = 1; # cache it return 1; } sub primes_upto { my $number = shift; if ( $number > $PRIMES[-1] ) { # extend cache is_prime($number); } my (@primes); foreach my $i ( 0 .. @PRIMES ) { next if $number > $PRIMES[$i]; @primes = @PRIMES[ 0 .. $i - 1 ] if $i; last; } return wantarray ? @primes : \@primes; } sub clear_prime_cache { @PRIMES = _get_primes(); %IS_PRIME = map { $_ => 1 } @PRIMES; return; } } sub _is_integer { return shift =~ /^[-+]?\d+$/; } sub _get_primes { # list from Crypt::Primes # http://search.cpan.org/dist/Crypt-Primes/ return ( # huge list of all primes < 2^16 ); } 1;

This is part of a larger set of code which is intended to be pure Perl, but given that I'm not a mathematician, I'm not sure if this is the best approach. I make heavy use of caching and my tests pass, but I'm wondering if there is not a better approach to writing both &is_prime and &primes_upto. Even with heavy caching, finding primes significantly larger than 2^16 can be pretty slow.

Cheers,
Ovid

New address of my CGI Course.

Comment on Math help: Finding Prime Numbers
Download Code
Re: Math help: Finding Prime Numbers
by CountZero (Bishop) on Nov 12, 2006 at 20:32 UTC
    It is said that for finding primes less than 10,000,000,000 the Sieve of Eratosthenes cannot be beat for speed.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Considering that lists of prime numbers are easily available, I'm not so sure whether a sieve beats reading them from disk.

      Unless one is doing it for fun, as an exercise, to check the validity of the list, or to expand a list, there's no reason to write a program that generates prime numbers starting from 2. We promote people to reuse code and not reinvent the wheel. There's no reason to be different when it comes to data.

        I'm not so sure whether a sieve beats reading them from disk

        Yep - a half-decent sieve does beat reading from disk ... unless it's a very short list that you're after.

        Cheers,
        Rob
        Update: Actually ... come to think about it ... a half-decent sieve might beat reading from disk, even if it is a very short list
Re: Math help: Finding Prime Numbers
by GrandFather (Cardinal) on Nov 12, 2006 at 20:35 UTC
Re: Math help: Finding Prime Numbers
by -b (Monk) on Nov 12, 2006 at 21:06 UTC
    Hi,

    For what I saw, you're using an implementation of the Sieve of Eratosthenes, which should be one of the most efficient ones for numbers up to 2^33. If you want to improve it, perhaps you should look into the Sieve of Atkin for O(N/log log N) operations instead of O(N).

    Larger primes use different methods to prove primality but if you can't control your input (I mean, if you're not just trying to find a large prime but rather to determine given a number n, if it's prime or not), then you'll most likely end up with some complex algorithms such as Elliptic Curves or AKS.

    Sorry for not posting any code, though :-(

    Cheers,
    -b

Re: Math help: Finding Prime Numbers
by samtregar (Abbot) on Nov 12, 2006 at 21:47 UTC
    I'm sure you can do better than a linear search in primes_up_to(). I'd start with a binary search, I think, since you've got an ordered set. You could also play with caching past searches and use that to pick a subset to search - depending on usage patterns that could be a big win.

    -sam

Re: Math help: Finding Prime Numbers (Updated with code)
by BrowserUk (Pope) on Nov 12, 2006 at 22:22 UTC

    If this isn't quick enough, I do have the binary version I mentioned somewhere.

    Update: Found it.

    The following shows the code finding the 15 millionth prime in 47 milliseconds, and producing the full list of the first 15 million in 12 seconds.

    C:\test>primes.pl 15e6 The 15000000th prime is: 275604541 1 trial of Finding the 15000000 prime (47.118ms total) 1 trial of Retrieving the first 15000000 primes (11.906s total)

    It requires 15 MB of disk storage, and 15 MB of ram if you only want to retrieve them individually. It takes ~600 MB to expand the full list in memory. This is the code and benchmark:

    #! perl -slw use strict; package primes; open my $fh, '<:raw', 'data\primes.deltas.bin' or die $!; read( $fh, my( $deltas ), -s( 'data\primes.deltas.bin' ) ); close $fh; sub firstNprimes { my( $n ) = @_; my @primes = unpack 'C*', $deltas; $primes[ $_ ] += $primes[ $_ - 1 ] for 1 .. $#primes; return \@primes; } sub primeN { return unpack "%32C$_[0]", $deltas; } package main; use Benchmark::Timer; die 'Supply N' unless @ARGV; $ARGV[ 0 ] +=0; ## force numeric my $T = new Benchmark::Timer; $T->start( "Finding the $ARGV[ 0 ] prime" ); print "The $ARGV[ 0 ]th prime is: ", primes::primeN( $ARGV[ 0 ] ); $T->stop( "Finding the $ARGV[ 0 ] prime" ); $T->start( "Retrieving the first $ARGV[ 0 ] primes" ); my $primes = primes::firstNprimes( $ARGV[0] ); $T->stop( "Retrieving the first $ARGV[ 0 ] primes" ); $T->report; print "The first 100 primes are:"; print for @{ $primes }[ 0 .. 99 ]; print "The last 100 primes (of the first ARGV[ 0 ]) are:"; print for @{ $primes }[ -100 .. -1 ];

    The file, data\primes.deltas.bin above is produced by downloading the 160 MB list from here, and manipulating it to reduce the storage requirement. I achieved the reduction by noting that the delta between successive primes is always less than 255 (for the first 15 million), and so can be stored as a single byte.

    To retrieve any individual prime, it is simply a process of summing the first N deltas. This would be rather slow using perl were it not for the little used feature of unpack which can calculate checksums of a string of binary data very quickly. To calculate the 32-bit checksum of a string of bytes, you use

    $checksum = unpack '%32C*', $bytes;

    Hence, the primeN() function simply becomes

    sub primeN { return unpack "%32C" . (0+$_[0]), $deltas; }

    And runs amazingly quickly, taking around N*3 microseconds, though the cost of the function call swamps the generation time until you reach 100,000 or so.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Math help: Finding Prime Numbers
by jbert (Priest) on Nov 12, 2006 at 23:00 UTC
    The sieve of Eratosthenes is a good algorithm. It requires N bits of storage for primes to N.

    Here is a pure-perl implementation, using 'vec' for the bit array and done as an OO perl module built around a blessed scalar reference (to the bit array). It isn't heavily tested, I'm afraid, but you use it as follows:

    use Sieve; # Rename as appropriate... my $sieve = Sieve->new; foreach my $number (qw/13 20 35 3/) { print "$number is ", $sieve->is_prime($number) ? "prime" : "composite", "\n"; } print join(", ", $sieve->primes_to(300)), "\n";
    And here is the code:
    package Sieve; use strict; use warnings; my $BITS_PER_BYTE = 8; my $INITIAL_SIZE = $BITS_PER_BYTE ** 2; sub new { # A sieve is a bit array, where 'true' => composite my $sieve = ''; # 0 isn't prime vec($sieve, 0, 1) = 1; # 1 isn't prime vec($sieve, 1, 1) = 1; my $s = \$sieve; bless $s, __PACKAGE__; # Pre-extend the array vec($sieve, $INITIAL_SIZE, 1) = 0; # And fill it in $s->_run; return $s; } sub is_prime { my $s = shift; my $n = shift; $s->_extend($n); return !vec($$s, $n, 1); } sub primes_to { my $s = shift; my $n = shift; $s->_extend($n); return grep { $s->is_prime($_) } 1..$n; } sub _run { my $s = shift; my $i; my $limit = sqrt ($s->_size); for ($i = 2; $i < $limit; ++$i) { next unless $s->is_prime($i); $s->_mark_multiples($i); } } sub _extend { my $s = shift; my $to = shift; return 0 if $to <= $s->_size; vec($$s, $to, 1) = 0; return $s->_run; } sub _mark_multiples { my $s = shift; my $p = shift; my $i; my $limit = $s->_size; for ($i = 2 * $p; $i < $limit; $i += $p) { vec($$s, $i, 1) = 1; } } sub _size { my $s = shift; return (length $$s) * $BITS_PER_BYTE; } 1;
      my $Limit = 1000000; my $HighestFactor = sqrt($Limit); my $is_prime=''; # Sieve array. # put in candidate primes: integers which have an odd number of repres +entations by certain quadratic forms. for $x ( 1..$HighestFactor){ my $x2 = $x*$x; last if $x2*2 >= $Limit; for $y ( 1..$HighestFactor ){ my $y2 = $y*$y; next if ($n = 3*$x2 - $y2) > $Limit; vec($is_prime,$n,1) ^= 1 if $x > $y && $n % 12 == 11; next if ( $n = 3*$x2 + $y2 ) > $Limit; vec($is_prime,$n,1) ^= 1 if $n % 12 == 7; next if ( $n = 4*$x2 + $y2 ) > $Limit; vec($is_prime,$n,1) ^= 1 if $n % 12 == 1 || $n % 12 == 5; } } # eliminate composites by sieving # if n is prime, omit all multiples of its square; this is sufficient +because # composites which managed to get on the list cannot be square-free for $n (5..$HighestFactor ){ next unless vec($is_prime,$n,1); for( $k=$n2=$n*$n; $k <= $Limit; $k += $n2 ){ vec($is_prime,$k,1) += 0 }; } # Present the results. $\="\n"; print for 2,3, grep vec($is_prime,$_,1), 5..$Limit;
Re: Math help: Finding Prime Numbers
by blazar (Canon) on Nov 12, 2006 at 23:06 UTC

    The Wikipedia article about primality tests seems fairly well written as an introduction, and has pointers to more complete resources. Just bear in mind that generating all primes up to some number and verifying whether a single number is prime or not are distinct problems, albeit related. Naive ideas connecting them are intuitive and immediate, but differences become relevant if computational complexity is to be taken into account. Indeed given that good old Sieve of Erathostenes is a good starting point, I was about to mention the quadratic sieve algorithm I had heard talking about, but a quick check revealed that it solves yet another (related, but more complex) problem, namely that of factorization.

    To bring the subject back on (Perl) topic, although in a funny manner, I take the liberty to quote Abigail's

    perl -le 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' $number
      perl -le 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' $number
      ...but slow...It also tries composite factors that have already been ruled out.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Math help: Finding Prime Numbers
by davido (Archbishop) on Nov 13, 2006 at 05:37 UTC

    I'm not sure what your ultimate design objective (and balance) is to be, but you mentioned heavy use of caching, which is probably a very good idea. My thought, however, is that if hard drive space is plentiful, and processor cycles are not, you could cache all primes up to 100,000,000. According to this URL, there are about 5.76 million primes in the range under 100,000,000. If you can store each number in the 0 .. 100,000,000 range as a four-byte (32 bit) word, you'll need 21 Megabytes to cache all primes between 0 and 100,000,000.

    Stored as a packed flat file of four-byte words, you could easily perform binary searches on the file, and thus determine primality in O(log N) time, where N is the number of primes you have cached. As for how to come up with this 21MB list, I'm sure there are sites out there that have already generated text-based lists of primes. Just convert that to packed words and presto.

    ...just thinking. Of course this assumes your design needs allow you all the storage space you want, and that you need the fastest possible solution. And it's not a complete solution, since there may still be times that you need to figure out primes past 100,000,000. For that, you'll still need a quick algorithm for testing primality.


    Dave

      Gee, that's an original thought.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Math help: Finding Prime Numbers
by QM (Vicar) on Nov 13, 2006 at 06:49 UTC
    In primes_upto, use the Sieve.

    Using the Sieve properly, division is not used, only addition (and the odd multiplication). Make a list of odd numbers up to the target. (Magically include 2 as the first prime, but it doesn't need to be in the list.) With 3 as the first odd prime, mark off all multiples of 3 up to the target. Search from 3 upwards until an unmarked number is found (5) - this is the next prime. Mark all multiples of this as composite. Lather, rinse, repeat.

    One optimization is to realize that all numbers less than the square of the next prime have been marked correctly as prime or composite. For instance, marking with 3, 5 is next, and all numbers less than 25 are correctly marked. When marking with 5, you can start marking with 25, since it has no factors less than 5. Any composite between 5 and 25 will already be marked, since it will have at least 1 factor less than 5. This quickly leapfrogs over previously covered ground. Since the primes also get bigger, the jump between marks does as well.

    Another is to realize that once all primes less than or equal to the square root of the target are found, marking stops. Any unmarked numbers above the square root are prime.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Math help: Finding Prime Numbers
by Zaxo (Archbishop) on Nov 14, 2006 at 03:18 UTC

    Math::Pari is the gold standard for number-theoretic things in perl. It isn't trivial to install, but it can do anything that is practical with prime numbers. Nothing matches it for speed and range.

    It has functions for primality testing, n-th prime, next higher prime from a number, and lists of primes. It can factor up to 60 digits or so very quickly. All the numbers are promoted to multiple precision at need. The libPARI library is used.

    After Compline,
    Zaxo

      Math::Pari is great--I wish I understood more of it than the ~1% that currently I do :)--but it's not so good for primes. By default (AS PPM version), primes() only knows the first 41561 primes:

      C:\test>p1 Perl> use Math::Pari qw[ prime primes nextprime precprime ];; Perl> print prime 41561;; 500257 Perl> print prime 41562;; PARI: *** not enough precalculated primes at (eval 7) line 1, <STDIN +> line 4.

      It will generate larger primes using nextprime(), but it pretty slow, taking around 64 seconds to get to the 1 millionth:

      Perl> print scalar localtime; $p = 500257; $p = nextprime ++$p for 41562 .. 1000000; print scalar localtime; print $p;; Tue Nov 14 04:36:02 2006 Tue Nov 14 04:37:07 2006 15485863

      But the biggest problem is that it forgets all the bigger primes as soon as it has returned them (below is a continuation of the previous session):

      Perl> print prime 41562;; PARI: *** not enough precalculated primes at (eval 9) line 1, <STDIN +> line 6.

      From the error message I guess it might be possible to increase the number of precalculated primes if you manage to build it yourself? But if you Linux guys have trouble with it, you can guess my chances :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        The $Math::Pari::initprimes variable controls initialization of precalculated primes. Since that doesn't go through the perl iface, but is a canned C routine, initialization is fast. To avoid setting that in a BEGIN block, you can say:

        use Math::PariInit qw( primes=12000000 );
        See the Initialization section of the Math::Pari docs.

        After Compline,
        Zaxo

Re: Math help: Finding Prime Numbers
by gam3 (Curate) on Nov 16, 2006 at 01:35 UTC
    For larger numbers, say: 66_666_666_666_666_666_666_666_666_666_666_666_666_667 you can use use some tricks that will proves that a number is composite. Here is a script that gives a better that 50% chance that a number is prime. By looping you can reduce the chance that a number is that not prime is reported as prome. 10 loops gives a 1 in 2^10 chance of being wrong.
    #!/usr/bin/perl use strict; use Math::BigInt::GMP; # Precondition: a, n >= 0; n is odd use strict; use warnings; #use diagnostics; use Crypt::OpenSSL::Random; use Data::Dumper; use Math::BigInt; use Math::BigFloat; sub brand { my $number = shift; my $n = Math::BigInt->new($number); my $l = Math::BigFloat->new($n + 1)->blog(2)->bceil; my $r = 0; while ($r == 0 or $r >= $n) { my $x = Crypt::OpenSSL::Random::random_pseudo_bytes(10); $r = Math::BigInt->new('0b' . unpack("B$l", $x)); if ($r < $n and $r) { last; } else { # printf "rand miss\n"; } } return $r; } sub jacobi { my ($a, $n) = @_; die unless defined $a; if ($a == 0) { # identity 1 return ($n == 1) ? 1 : 0; } if ($a == 2) { # identity 2 my $x = $n % 8; if ($x == 1 || $x == 7) { return 1; } elsif ($x == 3 || $x == 5) { return -1; } } if ( $a >= $n ) { # identity 3 return jacobi($a % $n, $n); } if ($a % 2 == 0) { # identity 4 return jacobi(2, $n) * jacobi($a/2, $n); } die if $a % 2 == 0; # identities 5 and 6 return ($a % 4 == 3 && $n % 4 == 3) ? -jacobi($n, $a) : jacobi($n, $ +a); } sub check { my $number = shift; my $n = Math::BigInt->new($number); return 'Composite 0' if $n->copy->bmod(2) == 0; my $a = 0; while ($a == 0) { $a = brand($number); } return 'Composite 1' if (1 != $a->copy->bgcd($n)); my $x = ($n - 1) / 2; my $q = $a->copy->bmodpow($x, $n); # die 'bad math' unless $q == $q_p; my $j = jacobi($a, $n); $j %= $n; if ($j == $q) { return 'Prime.'; } else { return 'Composite 2'; } } my $number = shift || die "Need a number."; my $loops = shift || 10; die if ($loops < 0); my $c = 0; while ($c++ < $loops) { my $ret = check($number); if ($ret =~ /Composite/) { print $ret, "\n"; exit; } } print "This is only a 1/2^", $loops, " chance that $number is not prim +e\n";
Re: Math help: Finding Prime Numbers
by ambrus (Abbot) on Jan 29, 2007 at 13:54 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://583585]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-10-26 01:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls