Re^5: RFC: Is this the correct use of Unicode::Collate?

by Jim (Curate)
in reply to Re^4: RFC: Is this the correct use of Unicode::Collate?
in thread RFC: Is this the correct use of Unicode::Collate?

I don't know if you are familiar with the NoSQL database engine BerkeleyDB (now owned by Oracle), but I have written a pure perl replacement that performs as well. In some cases where the data portion of the key/value pair are very large, it outperforms BerkeleyDB.

I'm familiar with NoSQL and key-value stores such as Berkeley DB. But what I'd never heard of before reading your PerlMonks post is the idiom—the trick—of modifying data to disambiguate otherwise identical keys by appending control codes or invisible characters to them. This idiom seems "weirdo" to me, just as it did to Tom, who first invoked the word to describe it.

Is my example Perl script a fair representation of the idiom your NoSQL database software uses to disambiguate like keys?

I'm not a database theory guru or a database programming wizard, but my gut sense is that the idiom you describe of ornamenting data with invisible control codes or other characters is fraught with problems. I understand how data modified this way would ensure uniqueness and preserve insertion order. But how then do you match such modified strings? Isn't there a better way to achieve the same objectives without altering data? Do other NoSQL database engines besides yours use this same idiom? If so, which ones?


