nteract / scrapbook

A library for recording and reading data in notebooks.

Home Page:https://nteract-scrapbook.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Capture lineage / sourcing of data so that repeated calculations can be avoided

MSeal opened this issue · comments

Capturing the source sha or other requirements to recompute or read scaps when calculating data would be helpful.

Could you explain a bit what use case you have in mind? Something like telling the user the scrap they just retrieved from a notebook needs recomputing?

For caching of results during computations we should checkout https://joblib.readthedocs.io/en/latest/memory.html which is well used and maintained by someone else (yay!).

So the core intention here would be to allow for the glue action against a particular ref to not push any data if the contents were identical. I don't think it's necessary at first, but having a path for success when a user wants to prevent expensive computation / pushes might be helpful. Another pattern may be to provide additional wrapping that allows the user to compute_and_glue data that will glue a reference without compute if the source data is considered equivalent by some registered function.