Move time series aggregation to an external module

Question

Move time series aggregation to an external module

sjpfenninger opened this issue 3 years ago · comments

Problem description

To reduce complexity of Calliope's core code, we only want a hook for time series aggregation and resampling, rather than actually doing it ourselves.

The external module could be:

Our own current code moved out of the Calliope core
tsam

TODO:

Remove all complex clustering algorithms from core (inc. masking).
Move time resampling to model.resample_time.
Make it possible to cluster the timeseries using a user-defined set of cluster IDs (the functionality already exists, we just need to move the definition to model.cluster_time.
Keep a config to switch enable inter-cluster storage when using clustering (e.g. model.include_inter_cluster_storage, default is True).
Update docs to tell people to prepare cluster IDs themselves using e.g. tsam.
Make hardcoded sum/mean of data on resampling explicit for every input parameter.
Move hardcoded sum/mean of data on resampling (calliope/time/funcs.py:294 ea89a66) to a model_data variable attribute (ideally, this would be encoded in the typedconfig rules).
Document justification for sum/mean of input parameters on resampling.

Bryn Pickering · Answer 1 · Thu Oct 26 2023 19:19:39 GMT+0800 (China Standard Time)

In the context of #452, we could now have config.init.time_resample alongside config.init.time_subset.

We could also move these two configuration items to config.build and allow a user to resample/slice data only when they build the optimisation problem?

As I see it, advantages:

Quicker initialisation of the model as we aren't doing any timeseries manipulation
ability to test different extents of resampling / time subsetting on-the-fly
Can save the initialised model to file and load it later to do different timeseries operations

Disadvantages:

larger model when input data is long, although time_resample would have no impact here as currently when we resample we keep a copy of the original timeseries in-memory anyway.
odd output timeseries / possible clashes in output. If resampling, one would get gaps between timesteps. If subsetting, one would get gaps either side of the subset.

Stefan Pfenninger · Answer 2 · Thu Jan 25 2024 02:19:53 GMT+0800 (China Standard Time)

We have decided not to provide clustering code for now, and leave it up to users to do clustering as per their requirements. As of 0.7, it's possible to supply user-defined clustering: e.g. config.init.time_cluster: cluster_days.csv