stateright / stateright

A model checker for implementing distributed systems.

Home Page:https://docs.rs/stateright

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DFS and/or DPOR

graydon opened this issue · comments

A fair number of explicit-state model checkers use DFS not BFS. I realize this would be a fairly big revision to the structure (at least of the checker module) but I think there are good reasons for it:

  • We can spit the sources (visited-state-tracking) hashtable in two: one piece that's a strictly-consulted and unbounded (but probably quite small), containing the current DFS path, used for detecting re-visits on a single path; and a second piece that's a size-bounded (randomly evicting on overflow) set of previously-visited states not on the current DFS path. A false negative on the second piece (due to an eviction) causes accidental revisiting of a subspace of the state-space but doesn't affect termination of the algorithm. This allows trading time and space: specifically avoiding memory exhaustion, at the cost of a longer runtime.
  • I think it might better-enable distributed operation in the future (on a cluster of machines) by delegating entire subtrees to separate nodes. Not sure about this.
  • More significantly, I think it's a path towards doing dynamic partial-order reduction (as described here: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.1291). Specifically in an actor system we could classify the choice-points in a given DFS path as splitting or not-splitting the future state space into disjoint sets of communicating actors and only backtrack and reorder within a given disjoint set.

I'm not sure about drawbacks. Maybe a slightly more complicated stack-of-choice-points structure to track progress? What do you think?

I‘m reading the paper now, and DPOR looks promising. I completely support adding a DFS implementation as a stepping stone.

@graydon I started working on DFS (see above) and need to make some decisions regarding both parallel checking and DPOR. Let me know if you're interested in discussing.

Resolving as the checker now supports DFS, while I had to make more invasive changes for DPOR, and those can be found at https://github.com/jonnadal/fibril (currently more experimental).