Btlmd / AttentionAccelerations

Course Project for Numerical Analysis, THU-CST, 2023 Spring

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attention Accelerations

This repo provides code for the course project #7 of Numerical Analysis, THU-CST, 2023 Spring.

On Long Range Arena, we tried to reproduce results in Skyformer, CosFormer, LARA and MEGA. We also measured the training speed and inference speed of these models.

Data Preparation

  • Download Preprocessed data from TsinghuaCloud

  • Unzip lra_data_mega.zip and lra_data_skyformer.zip and make the directory structure as follows:

data/skyformer
├── lra-image.dev.pickle
├── lra-image.test.pickle
├── lra-image.train.pickle
├── ...
├── lra-text.dev.pickle
├── lra-text.test.pickle
└── lra-text.train.pickle
data/mega
├── aan
│   ├── dict-bin
│   ├── label-bin
│   ├── src-bin
│   └── src1-bin
├── cifar10
│   ├── input
│   └── label
├── imdb-4000
│   ├── label-bin
│   └── src-bin
├── listops
│   ├── label-bin
│   └── src-bin
├── path-x
│   ├── input
│   └── label
└── pathfinder
    ├── input
    └── label

Installation

Prepare the environment by

conda create -n acce python=3.8
conda activate acce

# install `torch==1.8.0` follow your CUDA version, e.g.
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

# install skyformer dependencies
pip install -r skyformer/requirements.txt

# install mega and its dependencies
pip install -e mega

Run

Training & Inference

  • CosFormer, LARA and Skyformer

    cd skyformer
    python main.py --mode train --attn <attention-name> --task <task-name>
    • <attention-name>:

      • softmax: baseline attention
      • skyformer
      • cosformer
      • lara
    • <task-name>:

      • lra-listops
      • lra-pathfinder
      • lra-retrieval
      • lra-text
      • lra-image
  • MEGA

    cd mega
    bash training_scripts/run_<task-name>.sh
    • <task-name>:
      • listops
      • pathfinder
      • retrieval
      • text
      • image
  • The scripts select the best checkpoint on vavlidation set and evaluate on test set at the end of training.

Speed Test

  • CosFormer, LARA and Skyformer

    cd skyformer
    bash speed_tests.sh

    It runs speed tests for all softmax, skyformer, cosformer and lara on all 5 tasks.

  • MEGA

    cd mega
    bash timing_scripts/speed_tests.sh

    It runs speed tests for all MEGA-∞ and MEGA-128 on all 5 tasks.

Acknowledgement and Refernce

This repo is derived from Skyformer and MEGA, with implementation refernce from CosFormer and LARA. We thank the authors for their great work.

About

Course Project for Numerical Analysis, THU-CST, 2023 Spring


Languages

Language:Python 93.6%Language:TeX 4.1%Language:Cuda 0.6%Language:Shell 0.6%Language:Cython 0.5%Language:C++ 0.5%Language:Lua 0.2%