romilbert / samformer

Official implementation of SAMformer, a transformer leveraging Sharpness-Aware Minimization and Channel-Wise Attention for Time Series Forecasting.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SAMformer

This is the official implementation of SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention.

Authors

Equal contribution between Romain Ilbert and Ambroise Odonnat, and joint work with Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas and Ievgen Redko.

Overview

SAMformer is a novel lightweight transformer architecture designed for time series forecasting. It uniquely integrates Sharpness-Aware Minimization (SAM) with a Channel-Wise Attention mechanism. This method provides state-of-the-art performance in multivariate long-term forecasting across various forecasting tasks. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by $\mathbf{14.33}$% on average, while having $\mathbf{\sim4}$ times fewer parameters.

Architecture

SAMformer takes as input a $D$-dimensional time series of length $L$ (look-back window), arranged in a matrix $\mathbf{X}\in\mathbb{R}^{D\times L}$ and predicts its next $H$ values (prediction horizon), denoted by $\mathbf{Y}\in\mathbb{R}^{D\times H}$. The main components of the architecture are the following.

πŸ’‘ Shallow transformer encoder. The neural network at the core of SAMformer is a shallow encoder of a simplified Transformer. Channel-wise attention is applied to the input, followed by a residual connection. Instead of the usual feedforward block, a linear layer is directly applied on top of the residual connection to output the prediction.

πŸ’‘ Channel-Wise Attention. Contrary to the usual temporal attention in $\mathbb{R}^{L \times L}$, the channel-wise self-attention is represented by a matrix in $\mathbb{R}^{D \times D}$ and consists of the pairwise correlations between the input's features. This brings two important benefits:

  • Feature permutation invariance, eliminating the need for positional encoding, commonly applied before the attention layer;
  • Reduced time and memory complexity as $D \leq L$ in most of the real-world datasets.

πŸ’‘ Reversible Instance Normalization (RevIN). The resulting network is equipped with RevIN, a two-step normalization scheme to handle the shift between the training and testing time series.

πŸ’‘ Sharpness-Aware Minimization (SAM). As suggested by our empirical and theoretical analysis, we optimize the model with SAM to make it converge towards flatter minima, hence improving its generalization capacity.

SAMformer uniquely combines all these components in a lightweight implementation with very few hyperparameters. We display below the resulting architecture.

Results

We conduct our experiments on various multivariate time series forecasting benchmarks.

πŸ₯‡ Improved performance. SAMformer outperforms its competitors in $\mathbf{7}$ out of $\mathbf{8}$ datasets by a large margin. In particular, it improves over its best competitor TSMixer+SAM by $\mathbf{5.25}$%, surpasses the standalone TSMixer by $\mathbf{14.33}$%, and the best transformer-based model FEDformer by $\mathbf{12.36}$%. In addition, it improves over the vanilla Transformer by $\mathbf{16.96}$%. For each dataset and horizon, SAMformer is ranked either first or second.

πŸš€ Computational efficiency and versatility. SAMformer has a lightweight implementation with few learnable parameters, contrary to most of its competitors, leading to improved computational efficiency. SAMformer significantly outperforms the SOTA in multivariate time series despite having fewer parameters. In addition, the same architecture is used for all the datasets, while most of the other baselines require heavy hyperparameter tuning, which showcases the versatility of our approach.

πŸ“š Qualitative benefits. We display in our paper the benefits of SAMformer in terms of smoothness of the loss landscape, robustness to the prediction horizons, and signal propagation in the attention layer.

Installation

To get started with SAMformer, clone this repository and install the required packages.

git clone https://github.com/romilbert/samformer.git
cd SAMformer
pip install -r requirements.txt

Make sure you have Python 3.8 or a newer version installed.

Modules

SAMformer consists of several key modules:

  • models/: Contains the SAMformer architecture along with necessary components for normalization and optimization.
  • utils/: Contains the utilities for data processing, training, callbacks, and to save the results.
  • dataset/: Directory for storing the datasets used in experiments. For illustration purposes, this directory only contains the ETTh1 dataset in .csv format. You can download all the datasets used in our experiments (ETTh1, ETTh2, ETTm1, ETTm2, electricity, weather, traffic, exchange_rate) here.

Usage

To launch the training and evaluation process, use the run_script.sh script with the appropriate arguments :

sh run_script.sh -m [model_name] -d [dataset_name] -s [sequence_length] -u -a

Script Arguments

  • -m: Model name.
  • -d: Dataset name.
  • -s: Sequence length. The default is 512.
  • -u: Activate Sharpness-Aware Minimization (SAM). Optional.
  • -a: Activate additional results saving. Optional.

Example

sh run_script.sh -m transformer -d ETTh1 -u -a

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Open-source Participation

Do not hesitate to contribute to this project by submitting pull requests or issues, we would be happy to receive feedback and integrate your suggestions.

Contact

The joint first authors of SAMformer are Romain Ilbert and Ambroise Odonnat. In case of questions, feel free to contact them at the following addresses: Romain Ilbert romain.ilbert@hotmail.fr - Ambroise Odonnat ambroiseodonnattechnologie@gmail.com

Acknowledgements

We would like to express our gratitude to all the researchers and developers whose work and open-source software have contributed to the development of SAMformer. Special thanks to the authors of SAM, TSMixer, RevIN and $\sigma$Reparam for their instructive works, which have enabled our approach. We provide below a non-exhaustive list of GitHub repositories that helped with valuable code base and datasets:

About

Official implementation of SAMformer, a transformer leveraging Sharpness-Aware Minimization and Channel-Wise Attention for Time Series Forecasting.

License:MIT License


Languages

Language:Python 96.7%Language:Shell 3.3%