Persist Full InferenceData object as JSON
ColtAllen opened this issue · comments
This is a fairly straightforward task that will go a long way towards improving model functionality and maintainability of the code base.
Modules: lifetimes.models.__init__.BaseModel
class object
Issue: An ArviZ InferenceData
object is created as a model attribute whenever model.fit()
is called. Currently model persistence entails extracting model parameters from this attribute and dumping them into a memory-optimized JSON file. However, once this JSON file is loaded into a model, ArviZ plotting and statistical functions are no longer supported. The pre/post-processing code to format this JSON also adds unnecessary complexity to the BaseModel
class and could make future maintenance more difficult. Plus let's be honest, this isn't a 350GB NLP model; reducing a <10 MB InferenceData
object down to a <4 MB JSON is not worth the hassle.
Work Summary: Replace JSON formatting code in _unload_params()
, fit()
, save_params()
and load_params()
with ArViz methods like arviz.InferenceData.to_json()
and arviz.from_json()
.
https://arviz-devs.github.io/arviz/api/data.html
remove_hypers
can also be removed as a model class attribute, and I'm not opposed to renaming save_params()
and load_params()
to save_model()
and load_model()
either.
Other Comments: JSON is the preferred format for model persistence. Pickle files have their place for the fast read/writes demanded of online learning and passing objects between CPU threads, but the added complexity of their implementation just isn't worth it for a model that is only saved & loaded one time. They are also a security risk since malware can be obscured in a pickle format. I could totally see a hacker with prior system access overwriting a .pkl model file with an executable that exfiltrates customer IDs whenever the model is ran.