Feature Discussion: Propagating more information forward than :ok and :fail allow
jkni opened this issue · comments
Hey!
Using and loving Jepsen and Knossos.
Lately, I've ran into a few situations where :ok and :fail do not allow the propagation of enough information forward into the checking phase.
As an example that will apply to most situations, a :fail-ed :cas actually provides more information than :fail makes available to the checker - it constrains the register to a value not equal to the compared value at some time between the :invoke and :fail times.
As a more prominent example, a :fail-ed :cas in Cassandra LWT gives us the value at read-time that did not match v in our :cas :value [v v']. It is possible for a :fail-ed :cas to be able to provide as much information to the checker as an :ok-ed :read in this circumstance.
I have two main ideas on strategies to allow this.
- When completing the history in knossos, merge all new keys in the :ok-ed op into the :invoke op. Then, an enhanced-cas-register model could check keys on the :invoke and step the model appropriately. The main disadvantage of this is cognitive - it overloads the semantics of :ok.
- When completing the history in knossos, allow functions to be provided to handle differ op :types. In this case, we might just rewrite the :fail :cas to an :ok :read. Again, the main disadvantage here is the cognitive overhead of rewriting history.
Thoughts on any approach you might be interested in integrating into Knossos/Jepsen? I'd PR.
I think instead of either 1 or 2 we need to add an entirely new completion type; say, :illegal
, which means "Instead of this operation not occurring, I want to assert that this transition was in fact illegal." You'll have to thread this through a fair amount of knossos, but I think it's a sensible approach, and, as you note, lets us detect bugs that knossos would otherwise allow.
(specifically, we can't map :fail
to :ok
because knossos assumes that failed operations cannot have affected the state, and ok operations must affect the state)
(agreed - I spent a lot of time double-checking this on the Knossos internals)
I think :illegal nicely handles the general case of a :fail-ed :cas. I'm not sure how this handles the case of a :cas like Cassandra's, where we also should get a linearizable :read out of a :fail-ed :cas. Do you think this is enough of a special case that I should just hack around it in my tests?
So thinking about this more, I think it miiiight make sense to perform a post-hoc transformation on the history which remaps the semantics of those illegal operations into ok/fail/indeterminate, like you've done in the cassandra tests--because, like you've said, you can handle things like "a failed CAS can be a read". Similarly, you can have negative senses for functions on your model too, like "cant-cas" which compares and sets UNLESS the values match; I think those are semantically equivalent and keep the knossos code small.