Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
The stupid question is the question not asked
 
PerlMonks  

Re^5: RFC: Is this the correct use of Unicode::Collate?

by Jim (Curate)
on Jun 24, 2012 at 18:08 UTC ( #978067=note: print w/ replies, xml ) Need Help??


in reply to Re^4: RFC: Is this the correct use of Unicode::Collate?
in thread RFC: Is this the correct use of Unicode::Collate?

I don't know if you are familiar with the NoSQL database engine BerkeleyDB (now owned by Oracle), but I have written a pure perl replacement that performs as well. In some cases where the data portion of the key/value pair are very large, it outperforms BerkeleyDB.

I'm familiar with NoSQL and key-value stores such as Berkeley DB. But what I'd never heard of before reading your PerlMonks post is the idiom—the trick—of modifying data to disambiguate otherwise identical keys by appending control codes or invisible characters to them. This idiom seems "weirdo" to me, just as it did to Tom, who first invoked the word to describe it.

Is my example Perl script a fair representation of the idiom your NoSQL database software uses to disambiguate like keys?

I'm not a database theory guru or a database programming wizard, but my gut sense is that the idiom you describe of ornamenting data with invisible control codes or other characters is fraught with problems. I understand how data modified this way would ensure uniqueness and preserve insertion order. But how then do you match such modified strings? Isn't there a better way to achieve the same objectives without altering data? Do other NoSQL database engines besides yours use this same idiom? If so, which ones?

Jim


Comment on Re^5: RFC: Is this the correct use of Unicode::Collate?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://978067]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-04-19 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (481 votes), past polls