sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENH] Syntetos/Boylan ADI/CV feature extractor for different types of demand (intermittent etc)

fkiraly opened this issue · comments

Derived feature request from discussion wiht @ggjx22 in #6279.

The request is to implement the Syntetos/Boylan expert classification of time series, from Syntetos/Boylan (2005), The accuracy of
intermittent demand estimates, IJF.

Good first issue, should be simple to implement, so no need to interface from anywhere - recipe is here: https://www.sktime.net/en/latest/developer_guide/add_estimators.html

I would specify the estimator as follows:

Type

Series-to-primitives transformer. Per-instance.

Parameters

  • optional parameters "adi_threshold", "cv_threshold", default values are 1.32, 0.49, as in the paper.
  • optional parameter features, by default all parameters are computed. If not None, List of str, must contain "adi", "cv2", "class".

Behaviour

Computes three features or a subset thereof, as columns of the return of transform:

  • adi - average demand interval. This is the same as last index minus first index, divided by number of non-zero values minus one. For time like indinces, the unit should be in number of periods. Not sure what to do for non-periodic - if freq is unavailable, I would just drop the index.
    • there are some random references on the internet, which give adi as simply the fraction of non-zero values. Afaik that is not accurate in comparison to the original reference, the "minus one" does not cancel.
  • cv2 - this is just variance/(mean squared), but taken on the sample of values that are non-zero, in the series. The reference uses the biased estimator for variance, i.e., divide by number of values (not minus one)
  • class - derived class, string column, depending on whether adi <= adi_threshold and cv <= cv_threshold. Yes/yes is called "smooth", yes/no "erratic", no/yes "intermittent", no/no "lumpy", by the authors.

@fkiraly, I am trying to get a deeper understanding of time series, and I would love to work on this enhancement. If it is okay, could I take a crack at this? Thank you so much!

Absolutely! That's what good first issues are for!

Let us know if you need any help with the "new estimator" guide, or if you have suggestions for improvement.

do you need any help, @shlok191? Happy to review a draft PR if you have partial code

@fkiraly, I am so sorry about the delay! I have some midterms this week and the prior which took up all of my time! Would it be okay if I could make a PR in a couple of days?

sure, take your time, there's no rush!

Just wanted to make sure you're not stuck somewhere.
I only wanted to check if you need help.

Thank you so much! I'll come back with an update soon and communicate if I run into any road-blocks :)

@fkiraly, I'm sorry about the delay, I just got done with my final exams! I've made a first PR related to this and I'll make sure to complete this by this week. I've got all the free time now! 😄

great! I'm sure @ggjx22 is looking forward to it!