Re^4: Generating Unique numbers from Unique strings

You will find that this makes no difference at all. A 200,000 entry hash is actually "not big" in the scheme of things. I have often worked with hashes that big. The hash algorithm calculates a number based upon the string. In Perl, the hash table itself is generated in powers of 2. So the hash "bucket" is just a matter of masking off X bits from this calculated hash integer and that essentially is used as an array index. If there are multiple strings that hashed to the same "bucket", then there are some string comparisons done. All of this code is written in C and runs really fast. Because of the way that the hash function is chosen, even very similar looking strings will have very different hash values. The calculation of this hashing integer is lighting fast, far faster than what you would need to do to get an absolutely unique value.

Do some performance testing with your actual data and report back. I think you will be impressed and the code will be short.

Comment on Re^4: Generating Unique numbers from Unique strings


Keep It Simple, Stupid
	PerlMonks