|Perl: the Markov chain saw|
What order of scale are you hoping for?
Dozens of peer nodes, with peak performance of at most dozens of writes per second per node. (Usually it will be quieter than that. The nodes will mostly be used for other stuff, Riak should be running in the background.) Performance and throughput are not bottlenecks here - one machine can easily do that. The issue is availability, and the desire to avoid having another specialized machine per cluster.
The best approach to distributed data management...
Sorry, there is no best approach. The CAP theorem says that you can choose any two of Consistency, Availability, and Partition Tolerance. Depending on your application, it may be appropriate to wind up be at any corner.
Riak is at the AP corner. That is appropriate for what I am trying to build. We expect conflicts to be very rare. Ones that cannot easily be merged should be much, much rarer still. A low remaining error rate would be acceptable. Writes will come from all nodes we are running at. Internal networking problems or localized hardware problems should not limit the ability of other nodes to function as best they can.
Your suggestions would be appropriate if we were trying to wind up at the CA or CP corners. We're not.
A quick browse of Riak link provided shows that it does this for you at the physical data (disk) level, but you will still need to provide a similar mechanism, perhaps based upon the underlying 160-bit space, at the application logic level. That is one piece of what it looks like I need to write.