pastas / pastastore

:spaghetti: :convenience_store: Tools for managing timeseries and Pastas models

Home Page:https://pastastore.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Storing oseries to models relationship

dbrakenhoff opened this issue · comments

The latest PR #49 adds functionality to keep track of models per oseries. This is useful to keep track of, e.g. for getting a list of models for a certain location. The downside of the current implementation is that it requires a run through all stored models to build this dictionary, which can take a few seconds when creating a Connector object linking to an existing database.

This issue is a reminder to maybe think about a faster more efficient way to keep track of this, i.e. store this relationship in a separate library that is updated with each add_model() and del_model() call. This avoids having to rebuild this dictionary each time you connect to the database. Or perhaps another solution...?

New proposed solution to this problem. Still a bit of a work in progress and I'm not sure whether this is the way to go yet.

Upsides:

  • Oseries to models link is stored, so no need to reconstruct this relation on load.
  • See list of linked models directly in oseries DataFrame.
  • List of models stored directly in oseries metadata.
  • Simple implementation using existing libraries
  • Get dictionary of {oseries: [model_names_list]} through pstore.oseries_models property

There are some performance downsides to this implementation, but I'm not sure if they're really noticeable in practice...

Downsides:

  • The oseries cache is cleared every time a model is added or deleted and will have to be reconstructed after that.
  • Updating the oseries metadata to add a model link requires reading and then writing the oseries timeseries+metadata each time. This means 1 extra read/write for each model added/deleted.
  • Deleting and then adding an oseries again will remove the model_links entry. Currently there is no logic to automatically rebuild this model_links entry in this case.

A different proposed solution is presented in #68. This implementation creates a new library oseries_models where the relationship between models and oseries will be stored.

Upsides:

  • Names of models for a certain oseries are stored, making it easy to obtain models for a specific point without having to recalculate that relationship every time.
  • Relative to previous implementation, much more efficient. No need to read/write timeseries/metadata. Only has to store a single list with model names every time a model is added/deleted. No need to clear cached oseries dataframe after every model add/delete.
  • Get dictionary of {oseries: [model_names_list]} through pstore.oseries_models property

Downsides:

  • links between oseries and models not stored in oseries metadata DataFrame (but it can be easily obtained from pstore.oseries_models).
  • Added library added some complexity but not all that much.

Added in #62, closing issue.