Re^2: Design thoughts: iterator invalidation

Replies are listed 'Best First'.
Re^3: Design thoughts: iterator invalidation (links) by tye (Sage) on Feb 21, 2015 at 22:00 UTC
Nice talk. The blog he started just before the video was hosted at his employer who changed the URL to http://www.azulsystems.com/blog/cliff, which also links to his new blog, http://www.cliffc.org/blog/. I also found a link to one version of the code. The design actually comes quite close to RCU in many ways. He uses atomic Check-And-Set as a form of optimistic locking for updaters. (And there is significant work beyond just RCU in order to make that all work.) This means updaters don't have to wait for a lock but they may have to retry their updates (but the number of retries is bounded by the number of threads and, in practice, is almost always very small). Sounds like it never got folded in to java.util, though. I'm a bit curious why but haven't yet spent the time to try to find any discussions on that. The discussion about how the size of the hash is tracked indicates that having a single generation number could be a bottleneck when having a huge number of simultaneous updates. He tracks the number of entries in the hash by having a striped counter where each thread will only update one (partial) count and to get the count you have to add up all of the partial counts. So the multiple (cyclic) generation counts in a single 64-bit word that you mentioned would likely be a bottleneck for huge numbers of threads. There are probably tons of use cases where that ends up mattering very little, though. - tye	[reply]
Re^4: Design thoughts: iterator invalidation (links) by BrowserUk (Patriarch) on Feb 21, 2015 at 23:27 UTC
Nice talk. Its actually been a while since I last watched it through. Probably the last time was when I last mentioned it here; and the first time a year or so before that. I do remember that he goes through it at a lick -- thank dog for pause and rewind -- but for me that is preferable to those that labour each point. What I consider the most significant point in the talk was that he reinforced my long held stance that the thing that screws up most concurrent programming is that people are always trying to lock step data between threads and use far too much locking. Far more than is necessary most of the time. He almost skips over -- certainly deemphasizes -- assuming correct atomicity of writes to shared data, when one thread reads a shared value -- it is "the latest" available; it might change a microsecond after, or might have changed a microsecond before; but when you read it, it is the latest value. So do what you need to with it and then try and put it back. If its changed in the meantime, do it again. I agree that is very similar to RCU; and for individual updates to values; it works fine. The reason I've rejected it for the iterator problem is that it isn't individual values changing that is the big danger; it is the state of the entire structure changing -- ie. resizes. In order to use RCU, the iterator would have to take a complete copy of the hash in order to be able to provide the guarantees you need from an iterator. Ie. That all ((still) existing) keys will be visited once; and only once. In Cliff Click's scheme, he creates the resized hash and keeps the old one around; and values propagate from the old to the new piece meal as they are accessed. But he only makes the same guarantees for iterators as the original ConcurrentHashMap; which are basically very few and very weak. the multiple (cyclic) generation counts in a single 64-bit word that you mentioned would likely be a bottleneck for huge numbers of threads. I'm hoping that will not be the case. The reason for wanting to restrict the overall version to a 64-bit value is because that is the largest size for which I can guarantee to be able to deal with using an atomic instructions: `__int64 _InterlockedCompareExchange64(__int64 volatile ,__int64,__int +64)` [download] . There is also an atomic instruction for incrementing 16-bit fields: `short _InterlockedIncrement16(short volatile Addend)` [download] which I believe will mean that my version counting and comparing should be both lock and wait free thus should not become a bottleneck. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l] [select]
Re^5: Design thoughts: iterator invalidation (concurrency) by tye (Sage) on Feb 22, 2015 at 01:27 UTC
The reason I've rejected [RCU] for the iterator problem is that it isn't individual values changing that is the big danger; it is the state of the entire structure changing -- ie. resizes. In order to use RCU, the iterator would have to take a complete copy of the hash Well, I'd see the RCU route as you continue to iterate over the old copy of the hash, ignoring updates that are being done in the new, resized copy of the hash. So the RCU iterator doesn't take a copy of the hash, it just references a copy of the hash. That is, iterators started before the resize are all iterating over the same (old) copy of the hash while iterators started after the resize are all iterating over the same (new) copy of the hash. The iterator could even recheck against the current copy of the hash even though it is still iterating through an old copy (so it wouldn't use stale values nor deleted keys but would not see new keys). The major benefit to the approach described in the video is that a huge hash can be resized while other updates continue to happen. But both approaches involve full copies of the hash existing at the same time (but not a full copy per iterator; I'd not do that with either approach). Not that I'm trying to argue that you should use RCU, just pushing back lightly on one reason you stated for avoiding it. There are plenty of advantages. `__int64 _InterlockedCompareExchange64(__int64 volatile ,__int64,__int +64)` [download] Although I think that you are probably aware of this, the risk with this is that updates happen so frequently that you have to retry your CAS over and over, even an unbounded number of times (at least in theory). I've run into a lot examples of people jumping through hoops to avoid such a problem. I have yet to stumble upon somebody presenting good evidence that this actually became a practical problem when doing something on the order of a simple increment. So I still consider several possible explanations for such hoop jumping as plausible: I just haven't seen the evidence and CAS for incrementing a shared count can fairly easily become overly expensive. People are jumping through these hoops to avoid scaling problems with "obtain lock then increment" and they didn't try CAS on a single counter. People see real problems with too many retries via CAS for operations much more expensive than a simple increment and so avoid CAS on a single shared count due to those other scenarios, even though it hasn't been proved to lead to practical problems for a simple increment. People are jumping through these hoops as premature optimization simply out of fear that the number of retries could become problematically large. (Added soon after posting) Details of cache coherency between cores (or similar) is the real problem with frequent updates to a single shared count. So I call out the risk because I see so many examples of people trying to avoid that risk, not because I'm convinced that it can be a real problem or that it is likely to be a real problem. If I had to predict whether it would be a practical problem, I'd predict that it would be very unlikely to be very significant. But I've managed to avoid needing such an approach so I wouldn't bet much money. `short _InterlockedIncrement16(short volatile Addend)` [download] This risk I've heard of with this is that, at least on some architectures, an interlocked increment is accomplished via a bus lock and the bus lock can have (I've heard it claimed) a bigger negative impact on concurrency than a write lock would (at least if you are using this on a lot of shared counts where contention for any single lock would be low while the frequency of bus locks would be high). The times I've used such, I haven't observed a significant drop in concurrency, but that doesn't mean such can't happen, of course. Thanks for sharing your question, references, and plans. It has been quite interesting. - tye	[reply] [d/l] [select]
Re^6: Design thoughts: iterator invalidation (concurrency) by BrowserUk (Patriarch) on Feb 22, 2015 at 15:50 UTC


Come for the quick hacks, stay for the epiphanies.
	PerlMonks