Switch to tall format data instead of wide

Question

Switch to tall format data instead of wide

andnp opened this issue a year ago · comments

This will take some work and will require extensive testing.

I need to save to disk a mapping from hyper_id -> hyper_settings. Then every row of the results dataframe will have the hyper_id instead of the hyper_settings.

This will allow saving multiple metrics per row and should radically reduce storage costs. Also should be easier on the cluster.

Need to think about:

What happens when the number of hypers changes?
How do we save throughout the experiment? All at once at the end like the current approach?
When we downsample, we need to keep track of the current step number. With subsampling, this is easy. With window averaging, this might be harder.