Riak is at the AP corner. That is appropriate for what I am trying to build.
Yes. But you added the Consistency requirement when you asked for "A conflict resolution algorithm"
We expect conflicts to be very rare. Ones that cannot easily be merged should be much, much rarer still. A low remaining error rate would be acceptable.
If that's all true, you don't need to add conflict resolution. By your own words, they will occur very rarely and if a low error rate is acceptable to you.
But, if you feel the error rate might be too high without some effort to resolve conflicts, then it is just as easy and just as (in)efficient to fix them all as fix some. Especially as you say that: "Performance and throughput are not bottlenecks here"
Writes will come from all nodes we are running at. Internal networking problems or localized hardware problems should not limit the ability of other nodes to function as best they can.
From what I read of Riak, it already provides for fail-over at the hardware by redistribution of the 160-bit hashes around the ring. But it does require functioning nodes to be able to communicate.
With that in place, the simplest conflict resolution method you could sit atop, is to avoid the conflicts by routing (serialising) all write requests through the appropriate node.
That leaves two failure modes to be concerned with:
- A node goes down after a write request has been routed to that node but before it has been acknowledged.
The requesting node reissues the request after some time limit and it will be re-routed to whichever node has taken over responsibility for that range of the hash space.
- The network fabric between the requesting node and the serving node goes down.
If the network fabric connecting the nodes is unreliable, Riak will essentially be stuffed anyway.
Obviously, I only know the little you've told us, and I can envisage (a few) scenarios where conflict resolution might be better than conflict avoidance. But for most of them, I think you would be fooling yourself to think that Riak will survive, when the ability of the nodes to communicate with each other wouldn't.
Anyway, good luck. It sounds like you have your work cut out.