Storing oseries to models relationship

Question

Storing oseries to models relationship

dbrakenhoff opened this issue 3 years ago · comments

The latest PR #49 adds functionality to keep track of models per oseries. This is useful to keep track of, e.g. for getting a list of models for a certain location. The downside of the current implementation is that it requires a run through all stored models to build this dictionary, which can take a few seconds when creating a Connector object linking to an existing database.

This issue is a reminder to maybe think about a faster more efficient way to keep track of this, i.e. store this relationship in a separate library that is updated with each add_model() and del_model() call. This avoids having to rebuild this dictionary each time you connect to the database. Or perhaps another solution...?

Davíd Brakenhoff · Answer 1 · Tue Apr 26 2022 17:55:08 GMT+0800 (China Standard Time)

New proposed solution to this problem. Still a bit of a work in progress and I'm not sure whether this is the way to go yet.

Upsides:

Oseries to models link is stored, so no need to reconstruct this relation on load.
See list of linked models directly in oseries DataFrame.
List of models stored directly in oseries metadata.
Simple implementation using existing libraries
Get dictionary of {oseries: [model_names_list]} through pstore.oseries_models property

There are some performance downsides to this implementation, but I'm not sure if they're really noticeable in practice...

Downsides:

The oseries cache is cleared every time a model is added or deleted and will have to be reconstructed after that.
Updating the oseries metadata to add a model link requires reading and then writing the oseries timeseries+metadata each time. This means 1 extra read/write for each model added/deleted.
Deleting and then adding an oseries again will remove the model_links entry. Currently there is no logic to automatically rebuild this model_links entry in this case.

Davíd Brakenhoff · Answer 2 · Thu Apr 28 2022 19:40:01 GMT+0800 (China Standard Time)

A different proposed solution is presented in #68. This implementation creates a new library oseries_models where the relationship between models and oseries will be stored.

Upsides:

Names of models for a certain oseries are stored, making it easy to obtain models for a specific point without having to recalculate that relationship every time.
Relative to previous implementation, much more efficient. No need to read/write timeseries/metadata. Only has to store a single list with model names every time a model is added/deleted. No need to clear cached oseries dataframe after every model add/delete.
Get dictionary of {oseries: [model_names_list]} through pstore.oseries_models property

Downsides:

links between oseries and models not stored in oseries metadata DataFrame (but it can be easily obtained from pstore.oseries_models).
Added library added some complexity but not all that much.

Davíd Brakenhoff · Answer 3 · Tue May 03 2022 18:55:12 GMT+0800 (China Standard Time)

Added in #62, closing issue.