Support serializing internal state
Kixiron opened this issue · comments
I'm working on a user-sided application with the goal of fast response times, and I've really been wanting a way to cache the internal state of dataflows so that it can be quickly recreated across restarts, enabling as fast of a restart time as possible while also skipping work that was already done in previous lifetimes of the program
@Kixiron have you discovered any way to make this possible? I'm also interested—I would really love a DB that's like SQLite, but differential 😻
I wandered around the code base a bit, I'm not sure if it's possible without patching—or wrapping all the objects to log state because the subgraph
fields are private. But these are the areas of interest that I saw from digging around:
timely-dataflow/timely/src/worker.rs
Line 642 in 3671a3b
timely-dataflow/timely/src/progress/subgraph.rs
Lines 39 to 70 in 79ff074
timely-dataflow/timely/src/worker.rs
Line 214 in 3671a3b
Worker.paths
is also looking very interesting!
Unfortunately not, my hopes are mostly in disk backed differential arrangements but I don't think there's much progress towards that
disk backed differential arrangements
Sameee, I would love that please — even just applying simple maps would be fine for me right now as well — not that the incremental is hard to write, but I would like to not, if I don't have to. Have you been exploring what it would take for what you're imagining?
I'm wondering what would happen if I just started applying Serialize
& Deserialize
to things until something interesting happens 🤣
By and large it's a significantly more complex problem than just adding Serialize
to things, dataflow construction isn't the expensive part of reviving a dataflow, the expense lies in rebuilding indices over massive amounts of data data