Sparse Modular Activation for Efficient Sequence Modeling

Liliang Ren^1*, Yang Liu², Shuohang Wang², Yichong Xu², Chenguang Zhu², ChengXiang Zhai¹

¹University of Illinois at Urbana-Champaign, ²Microsoft Azure Cognitive Services Research, ^* Work done at Microsoft internship and UIUC.

Introduction

This is the PyTorch implementation of SeqBoat 🚤 proposed in our paper. This repository is based on MEGA and the fairseq package v0.9.0.

Updates

[Nov. 26] Added a standalone CIFAR-10 training script of SeqBoat for quickstart!
[Nov. 5] Released training scripts for enwik8 and added a standalone implementation of SeqBoat here!
[Sep. 21] Our paper is accepted by NeurIPS 2023!
[July 18] Released training scripts for LRA and Speech Commands.

Code Overview

The compress and extract operators for Sparse Modular Activation (SMA) are implemented in fairseq/modules/seqboat_utils.py with the functions compress_seq and extract respectively.
SeqBoat layer is implemented in fairseq/modules/seqboat_unit.py.

Setup

This repository requires Python 3.8+ and Pytorch 1.11+.

# Install from this repo
pip install -e .

For faster training, install NVIDIA's apex library following fairseq.

Quickstart

The easiest way to get started is to run the standalone_cifar.py script. This script trains a simple SeqBoat model on CIFAR-10:

python standalone_cifar.py --prenorm

Experiments

We also provide the training and testing scripts for each of the tasks in the experiment directory.

Citation

If you find our work useful, please consider citing:

@inproceedings{ren2023sparse,
  title={Sparse Modular Activation for Efficient Sequence Modeling},
  author={Liliang Ren and Yang Liu and Shuohang Wang and Yichong Xu and Chenguang Zhu and ChengXiang Zhai},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=TfbzX6I14i}
}

License

SeqBoat is under MIT license. The license also applies to model checkpoints.

Contact

Liliang Ren (liliang3@illinois.edu)

About

[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling

https://arxiv.org/abs/2306.11197

MIT License

Languages

Language:Assembly 82.8%Language:Python 6.3%Language:Pawn 4.6%Language:HTML 2.2%Language:C++ 1.8%Language:POV-Ray SDL 1.0%Language:Cuda 0.8%Language:PHP 0.3%Language:JavaScript 0.1%Language:CMake 0.1%Language:CSS 0.1%Language:Cython 0.0%Language:Shell 0.0%Language:C 0.0%Language:Lua 0.0%Language:Makefile 0.0%