ngruver / llmtime

Hi, may I check how the baseline results for the Monash benchmark (Figure 4, e.g. Wavenet, Transform., DeepAR, etc.) were obtained? From my understanding of the codebase, it is using the huggingface monash_tsf dataset repository to obtain the Monash time series. The prediction length is based on this:

llmtime/data/monash.py

Line 43 in 37d0a33

pred_len = len(val_example) - len(train_example)

My concern is that the prediction lengths from the huggingface dataset are different from the default prediction length in the Monash dataset. For example, solar 10 minutes from the hf dataset has a prediction length of 60 while the Monash baseline results have a prediction length of 1008. Please correct me if I am mistaking anything here. Thank you!

Hi Gerald,

Thanks so much for bringing this to our attention. The monash baseline numbers are from the original paper, and it is possible there is a mismatch in our evaluation. I will be on vacation this upcoming week, but I will take a close look the day I get back.

Nate

Hi @ngruver, any updates on this and plans to release updated results figure/table?

Hi Gerald, thanks for following up. We've updated the results in the NeurIPS camera ready (https://openreview.net/forum?id=md68e8iZK1). The monash numbers now include 19 datasets:

covid deaths,
solar weekly,
tourism yearly,
tourism quarterly,
tourism monthly,
australian electricity demand,
pedestrian counts,
hospital,
fred md,
us births,
nn5 weekly,
nn5 daily,
traffic weekly,
traffic hourly,
saugeenday,
cif 2016,
bitcoin,
weather,
sunspot

As you pointed out, solar 10 minutes has a much longer prediction horizon than original represented in the huggingface datasets and therefore we dropped that one from consideration. We corrected the horizons in the other datasets that were inconsistent.

I'm in the process of further expanding to 29 of the datasets by adding the following ones to the analysis:

kdd cup,
electricity hourly,
electricity weekly,
m1 yearly,
m1 quarterly,
m1 monthly,
m3 yearly,
m3 quarterly,
m3 monthly,
m3 other

After I finalize those results, I will update the arxiv.

Thanks for the update!

Prediction length for Monash benchmark