lenskit / binpickle

Binary pickling library

Home Page:https://binpickle.lenskit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support buffer de-duplication

mdekstrand opened this issue · comments

It's possible that an object may have multiple numpy arrays with the same contents (this will arise in some LensKit use cases). We can support de-duplication by recording more robust checksums (MD5 or SHA) of buffers, and making the buffer store effectively content-addressed.

Format version 2 has file checksums, which is one of the prerequisites for this.