sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENH] coordination discussion on foundation models, deep learning, backends

fkiraly opened this issue · comments

Opening this issue to coordinating the various summer projects in relation to foundation models, deep learning, backends, interfaces.

Below a list of related umbrella issues and individual issues - for now, focusing on forecasting primarily.

FYI @fnhirwa, @geetu040, @julian-fong, @pranavvp16, @Xinyu-Wu-0000.
FYI mentors @benHeid, @kirilral, @marrov, @onyekaugochukwu, @yarnabrina.

  • umbrella issue - foundation model zoo - #6177
    • issue subtype one: interfacing individual models and frameworks, eg., hugging face, chronos, moirai
    • issue subtype one: API extension for foundation models: handling "global model" train/predict, pretrained
  • umbrella issue - data containers - #6091
    • from this, my opinion on highest priority: polars (due to sklearn and nixtla-verse adoption)
    • parallel/distributed functionality should also be added to broadcasting (prio), and compositors (e.g., pipeline)
  • framework and "favourite model" interfaces:
  • deep learning backend API and architecture considerations
    • refactoring DL backend interface - priority on pytorch
    • model persistence and serialization - including nested serialization

Another darts PR: #5043

This one was generic. The regression focused one is #5997 (and related #5447), but these two are not deep learning based.

If we go by the topics proposed in GSoC proposals and previous work, we get - for the start:

  • @fnhirwa -> darts, pytorch, pytorch-forecasting
  • @geetu040 -> pytorch, pytorch-forecasting; hugging face, peft/lora
  • @pranavvp16 -> polars, sklearn/polars, nixtla/polars; hugging face, peft/lora
  • @julian-fong -> pre-trained models, fine-tuning, fastAI like API
  • @Xinyu-Wu-0000 -> global API, pre-training and fine-tuning examples

There are three intersections here:

One possible assignment that avoids conditionalities and duplications for the 1st month would be:

  • @fnhirwa -> darts, pytorch-forecasting ("framework interfacing" topic, pytorch backend topic)
  • @geetu040 -> hugging face, peft/lora, scattered foundation models ("foundation models & API" topic)
  • @pranavvp16 -> polars, sklearn/polars, nixtla/polars ("polars & distributed" topic)
  • @Xinyu-Wu-0000 -> leads on global API design, works with @benHeid, based on #6228
  • @julian-fong -> one of: pre-training/fine-tuning (fastAI like), or contribute to polars upgrade

please add any corrections, suggestions for improvement, comments, etc - if preferences lie elsewhere, we can of course switch things around. For discussion until the 1st tech meeting where we'll plan.

Possible further work item could be integration of GluonTS.

Yes - gluonts was actually part of the very original forecaster feature wishlist, here: #220
It's funny to see how long that wishlist was and how almost everything on it is now available - most recently the bagging ensemble which I upgraded due to tsbootstrap (FYI @astrogilda). In a similar vein, the newer "auto-gluon" is a "one size fits all" approach similar to autots.

gluonts has its own data container, there are already some converters in sktime (possibly incomplete): #2860

from May 10 meeting:

  • pytorch-forecasting is connected both to global API project of @Xinyu-Wu-0000 and @fnhirwa
    • @Xinyu-Wu-0000 PR #6228 will already provide full framework integration -> duplication with @fnhirwa subtpoc pytorch-forecasting
    • so @fnhirwa focus should be on darts in the first two weeks
  • @yarnabrina comment: darts converts everything to internal container - container item; existing PR that could/should be picked up #5043

As discussed in todays mentoring meet with @benHeid, and as mentioned here I will start working on adding support to polars scitype for the first month. Commenting this here to co-ordinate with other mentees and mentors, Please feel free to reply on this if any other mentee is also interested or working on adding polars support.

I am not too familiar with polars/parallel and distributed functionality yet but would like the opportunity to learn and contribute to adding polars support

@pranavvp16, @julian-fong, a high-level outline is in this issue here: #5423 (comment)

There are two things one could work in parallel:

  • sktime needs mtypes implemented, we could start with Series, then Panel. Since polars has no multi-index (is this correct?) I would suggest the same conventions as in dask, for Panel.
  • in skpro, a polars container format is already implemented, although this is not full support as internally it just converts back/forth.

(and imo these are currently the only two fully parallel items)

The "battle plan" for support is - in both packages - first mtype support, then enable support in a few estimators, see if we can support eager and even lazy.

I think skpro estimators are simpler, so if in parallel to working on sktime mtypes, we try enabling native polars support for a number of estimators, we will learn of the challenges and solutions ahead of time, and at a lower cognitive cost.

What are your thoughts? Any preferences?

Related to polars, there is also this refactor of the datatypes module:

#6033

This would make it easier to add mtypes with soft dependencies, and someone could pick it up and review - or complete - it. I was also going to look at it soon.

yess the plan looks good to me for now, but polars doesn't even support index as well as multindex. Also if I'm not wrong we have to get this refactor merged before we can start adding polars mtype support in sktime ??

yes the plan looks good to me for now, but polars doesn't even support index as well as multindex.

That could be handled the same way, no? Have a column called __index or similar. That seems only a minor adaptation to how we have dealt with dask.

Also if I'm not wrong we have to get this refactor merged before we can start adding polars mtype support in sktime ?

No, there is no such conditionality, the refactor would just make it more convenient to add new data container types.

As per the last conversation with @fkiraly I'll pick up this [ENH] darts adapter #5043 and liaise with @yarnabrina

Commenting here to coordinate so It can't collide with any other ongoing task.

Re polars, to get back to coordinating tasks, current discussion sounds like:

  • @pranavvp16 will work on type checkers and converters for sktime, that is issue #5423. Can you kindly comment there that I can assign you?
  • @julian-fong will work on skpro functionality to extend polars coverage using the existing data type. I will open an issue there.

Does this make sense, and does this align with your preferences?

As per the last conversation with @fkiraly I'll pick up this [ENH] darts adapter #5043 and liaise with @yarnabrina

Excellent - the issue is #1624, could you kindly comment there so I can assign you?

@julian-fong, polars issue in skpro: sktime/skpro#342

Quick note, fine-tuning for future and current interfaces seem contingent on the global forecasting interface in #6228.

So, we ought to ensure:

  • we consider fine-tuning as a second-stage work item once #6228 or an equivalent global forecasting API extension is merged, for every foundation model interfaced
  • #6228, or the API extension parts of it, get merged soon-ish

FYI @benHeid, @Xinyu-Wu-0000, thoughts?