Welcome to the sktime tutorial at pydata Amsterdam 2023

This tutorial is about making probabilistic predictions, probabilistic forecasts with sktime, and probabilistic supervised regression with skpro:

sktime is a unified framework for various time series related machine learning tasks including forecasting, classification, regression.
skpro is a unified framework for tabular probabilistic predictions and tabular probability distributions.

sktime and skpro are designed interoperable, and both contains algorithms and tools for building, applying, evaluating modular pipelines and composites. Both packages are easily extensible by anyone, and interoperable with the python data science stack including sklearn, skbase, and pandas.

The presentation will showcase probabilistic prediction in skpro and sktime:

probabilistic prediction interfaces
probabilistic prediction metrics, e.g., quantile loss, or CRPS, log-loss for distribution forecasts
tuning using probabilistic metrics
conformal probabilistic intervals for any pipeline
compositors to make any point prediction estimator probabilistic
for time series: hierarchical and global probabilistic forecasts, reduction to regression

🚀 How to get started

In the tutorial, we will move through notebooks section by section.

You have different options how to run the tutorial notebooks:

Run the notebooks in the cloud on Binder - for this you don't have to install anything!
Run the notebooks on your machine. Clone this repository, get conda, install the required packages (sktime, seaborn, jupyter) in an environment, and open the notebooks with that environment. For detail instructions, see below. For troubleshooting, see sktime's more detailed installation instructions.
or, use python venv, and/or an editable install of this repo as a package. Instructions below.

Please let us know on the sktime discord if you have any issues during the conference, or join to ask for help anytime.

💡 Description

Probabilistic predictions make statements about the uncertainty or likely variation of the forecast, e.g., intervals at nominal coverage or conditional distributions. They appear in probabilistic forecasting as well as in probabilistic supervised (tabular) regression. This tutorial presents probabilistic forecasting capability in the skpro and sktime packages, combined with a methodological overview.

sktime is a widely used package for time series, skpro covers probabilistic (tabular) regression. Both are based on skbase, and designed for interoperability with each other and sklearn.

This tutorial presents the joint designs for probabilistic predictions and modular estimator interfaces. It also gives an overview of pipelines, tuning using probabilistic metrics, and compositors that can be used to turning any point forecaster into probabilistic forecasters, such as conformal or empirical interval estimators.

The presentation will showcase skpro and sktime, for tabular and time series tasks:

probabilistic prediction interfaces
metrics, e.g., quantile loss, or CRPS, log-loss for distribution forecasts
tuning using probabilistic metrics
conformal probabilistic intervals for any pipeline
compositors to make any point prediction estimator probabilistic
for time series: hierarchical and global probabilistic forecasts, reduction to regression

From a methodological perspective, we will cover:

interval forecasts: producing intervals with a nominal probability of the observation to be contained in the interval
quantile forecasts: specifying one or multiple quantiles of a predictive forecast distribution
fully probabilistic forecasts: producing a symbolic representation of a predictive forecast distribution
simulators or samplers from probabilistic forecasting models

As research on software interfaces and mathematical conceptualization in this area is still an ongoing endeavour, challenges will also be discussed, with invitations to contribute.

sktime and skpro are developed by an open community, with aims of ecosystem integration in a neutral, charitable space. We welcome contributions and seek to provides opportunity for anyone worldwide. We invite anyone to get involved as a developer, user, supporter (or any combination of these).

🎥 Other Tutorials:

👋 How to contribute

If you're interested in contributing to sktime, you can find out more how to get involved here.

Any contributions are welcome, not just code!

Installation instructions for local use

To run the notebooks locally, you will need:

a local repository clone
a python environment with required packages installed

Cloning the repository

To clone the repository locally:

git clone https://github.com/sktime/sktime-tutorial-pydata-Amsterdam-2023.git

Using conda env

Create a python virtual environment: conda create -y -n tutorial_env python=3.9
Install required packages: conda install -y -n tutorial_env pip skpro sktime seaborn jupyter pmdarima statsmodels
Activate your environment: conda activate tutorial_env
If using jupyter: make the environment available in jupyter: python -m ipykernel install --user --name=tutorial_env

Using python venv

Create a python virtual environment: python -m venv tutorial_env
Activate your environment: source tutorial_env/bin/activate
Install the requirements: pip install skpro sktime seaborn jupyter pmdarima statsmodels
If using jupyter: make the environment available in jupyter: python -m ipykernel install --user --name=tutorial_env

sktime / sktime-tutorial-pydata-Amsterdam-2023