http://www.perlmonks.org?node_id=429624


in reply to Re: How to remove duplicates from a large set of keys
in thread How to remove duplicates from a large set of keys

Thanks for your repaly, Corion.

In the end, you will still need to have all keys in memory, or at least accessible
Why, in case of using a database I can just try to insert a new value. If that value is already exists in the table I'll get an exception 'Cannot insert a duplicated value bla-bla-bla'. But otherwise a new value will be inserted the the table.

a million keys shouldn't eat too much memory
The most important criterion for me is a speed of processing of new values. I haven't use databse approach yet but in case of using a hash a processing of one value takes about 40 seconds with 1 million hash keys. But the number of keys is increased and the time increased too.

---
Michael Stepanov aka nite_man

It's only my opinion and it doesn't have pretensions of absoluteness!