Re^4: Scaling Hash Limits

in reply to Re^3: Scaling Hash Limits
in thread Scaling Hash Limits

Hi BrowserUk, you're making a very interesting point, which nobody seems to have picked up so far. I can see two ways of significantly reducing the memory required for the task at hand. One is that we probably don't need to worry about 12-digit numbers if we can narrow down the ID range really needed to something of the order of, say, about 200 to 250 million numbers, which seems to be a reasonable hypothesis if the IDs are allocated sequentially by the system. Then once we have such a narrower range, we only basically need only one bit per number in the range, and we just need to set one bit if a particular number has already been seen (so 1 bit per number in the range). These two observations could drive the memory requirement to perhaps 30 megabytes, if I can still think clearly this late in the evening, but I can't see how to reduce this memory requirement to only ~~4 megabytes~~ 8 megabytes. I would be grateful if you could enlighten me on this, and I am sure many others would benefit from these ideas.

Update: corrected my error of inattention: BrowserUk said 8 megabytes, not 4 megabytes.

In Section Seekers of Perl Wisdom