Re: Design thoughts: iterator invalidation (depends)

Depends how you implement the iterator. I would often implement the iterator by caching the list of keys and iterating that (could be a problem if the number of keys is huge). Then you don't have to invalidate the iterator, just check 'exists' before returning each key. Though, exactly how you do that can still lead to race conditions, of course.

It also depends on the operation being done with the iterator. If you are doing an operation with an iterator where changes in the keys would cause a problem, then you might need to implement some form of locking. Your description makes me suspect that the operations you are doing are unlikely to be such that missing a new key is a big deal. If it is, then I might do optimistic locking where, after I've finished iterating, I take a lock, check that the list of keys hasn't been updated since I initialized the iterator, apply the update indicated by my calculations, then unlock.

If you are just iterating the keys because you need to process every key eventually, then invalidating the iterator each time a change is made can lead to your loop starting over frequently such that the loop never gets to the end and so keys that don't get iterated early are simply never reached.

There is no one right answer. So, yes, there may be value in having an option when creating the iterator.

As for how to indicate that an iterator has become invalid? I'd throw an exception. Though, you could also just allow for checking for validity, thus providing the caller the option as to whether or not they care (such as only at the end). But if it is likely that there are cases where changes invalidate the result being accumulated, then I'd rather have the option of having the checking done inside the iterator rather than force the caller to remember to do the checking.

How to check that an iterator has become invalid? I'd track a generation number for the state that is being iterated. An operation that makes an important change to the state of what is being iterated would increment the generation number. The iterator would copy the generation number when created and throw an exception when iterated if the generation number has changed.

The worst case here calls for using MVCC (where you track a generation number for transactions and can have multiple versions for each key and you purge old versions when there are no more "live" transactions whose life span intersected with the version's life span).

- tye

Comment on Re: Design thoughts: iterator invalidation (depends)

Replies are listed 'Best First'.

Re^2: Design thoughts: iterator invalidation (depends)
by BrowserUk (Patriarch) on Feb 21, 2015 at 14:53 UTC

I would often implement the iterator by caching the list of keys and iterating that (could be a problem if the number of keys is huge).

Your parenthetical excludes this approach for my purposes. For uses where the hash is small enough for that to not be a problem; then it is simpler if the user just takes a none shared copy of the shared hash and iterates that.

There is no one right answer. So, yes, there may be value in having an option when creating the iterator.

By the same logic that precedes the above quote, I've reached the same conclusion. And I concur with your logic that folows it also.

Here's my first pass description of what I think I'm going to implement ('scuse the formatting and rambling thought process.):

When creating an iterator, the user can pass a set of flags that indic
+ate their requirements for the iterator:

INVALIDATE_NONE                         0x0
INVALIDATE_ONRESIZE                  0xFFFF
INVALIDATE_ONKEYADD              0xFFFF0000 
INVALIDATE_ONKEYDELETE       0xFFFF00000000
INVALIDATE_ONVALUECHANGE 0xFFFF000000000000
INVALIDATE_ALL           0XFFFFFFFFFFFFFFFF

Each hash will contain a version number, a simple monotonic counter.

The flags will control which actions will update that counter. 

Each iterator will take a copy of the counter at the point of its crea
+tion;
and compare that copy with the counter in the hash each time it is cal
+led.

If they are different, it will free the iterator storage and raise an 
+exception.

Nice idea -- economical -- but it would mean that either:

    the flags would need to be applied to the hash not the iterator an
+d therefore all iterators for a given hash would have the same attrib
+utes.
    
or

    each of the monitored operations would have to update the versions
+ in all current iterators.
    Even for modest numbers of iterators, the looping, indirections an
+d locking involved would impose a substantial hit on the performance 
+of the affected operations.

or

    the hash would need separate counts for resizes, adds, deletes and
+ value changes :(
    
    Could they be safetly combined into a single count? (eg. union{ U6
+4 version; struct { U16 updates, adds, deletes, resizes; } })
    With the flags specifying a mask applied to the U64 version to det
+ect the changes?
    What are the odds of an iterator being called when exactly 2**16 d
+isallowed changes (* the number of disallowed types of change) have o
+ccurred?
        Minimal! The iterator would have to be called the first time a
+fter creation, when exactly 65536 (or some exact multiple thereof) ch
+anges of the disallowed type had been made; and not once before.
        And if more than one disallowed type the odds are 2^16 (*2^16)
+ slimmer. 
        And in the unlikely event that occurred; the iterator would be
+ invalidated then next time it was called unless it also happened aft
+er an exactly multiple of 65536 changes have been made. 
        As close to impossible as I need!.
        
Conclusion:
    The version counter will not be a simple monotonic counter of all 
+changes, but rather a 64-bit value consisting of 4 x 16-bit unsigned,
+ rollover counters. 
    The per iteration check consists of masking the iterator copy and 
+comparing against a masked view of the hash version.
[download]

Summary:

Each hash contains 4 16-bit counters viewable as a 64-bit uint: union{ U64 version; struct { U16 updates, adds, deletes, resizes; } } The fields are incremented by the corresponding operations.

Each iterator takes a copy of this on instantiation; and then compares its copy with the hash copy on each iteration (masking per instantiation flags). If a disallowed operation has occurred, the iterator frees itself; and raises an exception.

Further thoughts, critiques?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

[reply]
[d/l]
[select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks