Perl: the Markov chain saw PerlMonks

### Re: Unable to Understand grep and hash use for prime no.

by anonymized user 468275 (Curate)
 on Aug 04, 2015 at 11:27 UTC ( #1137350=note: print w/replies, xml ) Need Help??

Finding out if a number is prime has the typical algorithm of testing all integers less than the square root of the number for whether they are a factor (and of course eliminating a perfect square). The author is trying to trade off storage for execution time by saving previous calculations as a hash (key = number, value = boolean for whether prime). I have some thoughts:

(1) only prime factors need to be tested as factors and so test factors (all of them used per test) should also have their results kept in the lookup, so the lookup storage belongs to the function 'is_prime' (could be a package variable) rather than held at the scope shown in the snippet.

(2) hash access slows disproportionately whereas this lookup could just be an array. That array could be packed down to one bit per element for the most efficient storage, given that it has only boolean values. Lookup (and registration) can be achieved without unpacking from that using bit operators such as '<<' and '&', thereby winning on processing as well as storage.

One world, one people

• Comment on Re: Unable to Understand grep and hash use for prime no.

Replies are listed 'Best First'.
Re^2: Unable to Understand grep and hash use for prime no.
by davido (Cardinal) on Aug 04, 2015 at 14:21 UTC

(2) hash access slows disproportionately whereas this lookup could just be an array. That array could be packed down to one bit per element for the most efficient storage, given that it has only boolean values. Lookup (and registration) can be achieved without unpacking from that using bit operators such as '<<' and '&', thereby winning on processing as well as storage.

What you describe is a primes sieve. One common prime generation technique is the Sieve of Eratosthenes. Another (sometimes slightly faster, but always more complicated) is the Sieve of Atkin.

I believe danaj has a blazing fast segmented sieve in Math::Prime::Util, and I have a pretty basic non-segmented Sieve of Eratosthenes implementation in Math::Prime::FastSieve.

Dave

Yes that's right, Dave. Coincidentally the sieve technique came up here only three weeks ago when a poster was trying to check a conjecture about Recaman's sequence. In thinking about primes, I did consider XS and C++ funnily enough, until I realised that prime or not is conceptually boolean and can be packed down to bits, so XS and C++ might be less urgent. Nevertheless I totally applaud the work you've done and XS and C++ is a very good choice.

One world, one people

Re^2: Unable to Understand grep and hash use for prime no.
by danaj (Friar) on Aug 05, 2015 at 08:21 UTC

In this case, the example is simply trying to show that results can be cached. As you say, we could reduce storage cost a lot and speed a little by using an array, string, or vector. Though one has to be careful with that as a single input of (for example) 1e18 will work fine with the hash but blow up with the other methods.

It turns out using a module (e.g. ntheory) for is_prime() is easier and faster overall anyway. No need to spend time on sieving, caching, memoize, etc. for these small sizes. But looking at the full context for this code, this is completely backwards -- he's using primality as a simple test vehicle to show caching.

That's why Davido's module uses XS and C++. An optimised Perl implementation would have a challenge to re-invent the processor in some ways, whereas in C++, structs can be used to map one datatype over another without any conversion, bit-shifting or reassignment making it easier to simulate a 2^N bit machine, where N is arbitrary.

One world, one people

That's why Davido's module uses XS and C++. An optimised Perl implementation would have a challenge to re-invent the processor in some ways, whereas in C++, structs can be used to map one datatype over another without any conversion, bit-shifting or reassignment making it easier to simulate a 2^N bit machine, where N is arbitrary.

I think you've missed the point. You said it would be more space efficient to use a bit array vs. a hash. That's right if your inputs are small, as a bit array (in Perl or C) takes only only N bits (N being the max entry), vs. a hash which in Perl is quite large per entry. Storing the result of every prime below 10M takes about 67MB as a hash, while only 1.5MB as a bit array (could be even less with some optimizations). But if I store the result of 10k random 32-bit numbers, then the hash wins by a huge amount.

My example was 1e18. This one entry now takes 1e18 bits to store just the single result. You can't do that, whether it is C or C++ or Perl. You need a different way to think about it. Your first solution is also a non-starter, as you certainly don't want to generate or store the 2.47e16 primes under that value. Unfortunately Math::Prime::FastSieve doesn't work at these large sizes, while Math::Prime::XS and Math::Prime::Util (ntheory) do.

Primality is not the same problem as prime generation.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1137350]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2021-12-09 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
R or B?

Results (36 votes). Check out past polls.

Notices?