amazon-science / unconditional-time-series-diffusion

Official PyTorch implementation of TSDiff models presented in the NeurIPS 2023 paper "Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Train on custom dataset

tomyjara opened this issue · comments

Hi! How are you?

I found that tsdiff could be a great tool for generating eeg data. I have a dataset containing the channels measurements from an eeg obtained in an experiment and I would like to train your model with this data. How should I do in order to train your model with a custom dataset?

Thanks!

Hi @tomyjara!

You can use something like this to build a custom dataset.

  1. Create a JSON lines file with your time series data. Basically every line has one time series in JSON format with two keys, start (the start time stamp) and target (the actual time series). I have attached an example file. Note that the time series are not required to have the same start or length.

  2. Use this function to load the file as a GluonTS dataset.

from pathlib import Path

from gluonts.dataset.split import split
from gluonts.dataset.common import (
    MetaData,
    TrainDatasets,
    FileDataset,
)


def get_custom_dataset(
    jsonl_path: Path,
    freq: str,
    prediction_length: int,
    split_offset: int = None,
):
    """Creates a custom GluonTS dataset from a JSONLines file and
    give parameters.

    Parameters
    ----------
    jsonl_path
        Path to a JSONLines file with time series
    freq
        Frequency in pandas format
        (e.g., `H` for hourly, `D` for daily)
    prediction_length
        Prediction length
    split_offset, optional
        Offset to split data into train and test sets, by default None

    Returns
    -------
        A gluonts dataset
    """
    if split_offset is None:
        split_offset = -prediction_length

    metadata = MetaData(freq=freq, prediction_length=prediction_length)
    test_ts = FileDataset(jsonl_path, freq)
    train_ts, _ = split(test_ts, offset=split_offset)
    dataset = TrainDatasets(metadata=metadata, train=train_ts, test=test_ts)
    return dataset
  1. This get_custom_dataset can be used as a replacement for
    dataset = get_gts_dataset(dataset_name)
  2. Modify the default config appropriately, especially the context length, lags, etc.

Thanks @marcelkollovieh for helping with the response!