Storing oseries to models relationship
dbrakenhoff opened this issue · comments
The latest PR #49 adds functionality to keep track of models per oseries. This is useful to keep track of, e.g. for getting a list of models for a certain location. The downside of the current implementation is that it requires a run through all stored models to build this dictionary, which can take a few seconds when creating a Connector object linking to an existing database.
This issue is a reminder to maybe think about a faster more efficient way to keep track of this, i.e. store this relationship in a separate library that is updated with each add_model()
and del_model()
call. This avoids having to rebuild this dictionary each time you connect to the database. Or perhaps another solution...?
New proposed solution to this problem. Still a bit of a work in progress and I'm not sure whether this is the way to go yet.
Upsides:
- Oseries to models link is stored, so no need to reconstruct this relation on load.
- See list of linked models directly in oseries DataFrame.
- List of models stored directly in oseries metadata.
- Simple implementation using existing libraries
- Get dictionary of {oseries: [model_names_list]} through
pstore.oseries_models
property
There are some performance downsides to this implementation, but I'm not sure if they're really noticeable in practice...
Downsides:
- The oseries cache is cleared every time a model is added or deleted and will have to be reconstructed after that.
- Updating the oseries metadata to add a model link requires reading and then writing the oseries timeseries+metadata each time. This means 1 extra read/write for each model added/deleted.
- Deleting and then adding an oseries again will remove the model_links entry. Currently there is no logic to automatically rebuild this model_links entry in this case.
A different proposed solution is presented in #68. This implementation creates a new library oseries_models
where the relationship between models and oseries will be stored.
Upsides:
- Names of models for a certain oseries are stored, making it easy to obtain models for a specific point without having to recalculate that relationship every time.
- Relative to previous implementation, much more efficient. No need to read/write timeseries/metadata. Only has to store a single list with model names every time a model is added/deleted. No need to clear cached oseries dataframe after every model add/delete.
- Get dictionary of {oseries: [model_names_list]} through
pstore.oseries_models
property
Downsides:
- links between oseries and models not stored in oseries metadata DataFrame (but it can be easily obtained from
pstore.oseries_models
). - Added library added some complexity but not all that much.
Added in #62, closing issue.