Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: How to remove duplicates from a large set of keys

by Tanktalus (Canon)
on Feb 10, 2005 at 15:29 UTC ( #429778=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to remove duplicates from a large set of keys
in thread How to remove duplicates from a large set of keys

Whether you have your million records in memory (fast) or on disk in a database (slow), you have to take the time to insert your new data. Looking up existing data is different - as explained, looking up in a hash is O(1): you take the key, perform a calculation on it (which is dependant on the length of the key, not the size of the hash), and go to that entry in the (associative) array. Looking up in a database cannot be any faster than O(1). It can be as bad as O(log N) (I can't imagine any database doing an index lookup any slower than a binary search), which is dependant on the number of data points you're comparing to.

The only way that a database could be faster is if it's a big honkin' box with lots of RAM, and that's a different box from your perl client.

This problem is one of the primary reasons to use a hash. (Not the only one, but one of them nonetheless.)

  • Comment on Re^3: How to remove duplicates from a large set of keys

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://429778]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2021-07-24 04:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?