cscribano / DCT-Former-Public

Public repository for "DCT-Former: Efficient Self-Attention withDiscrete Cosine Transform"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform PAPER

Requirements

  • Create a conda envrionment using the provided envrionment.yml in docs as described HERE
conda env create -f environment.yml

Dataset

Pretraining Dataset

The pre-processing stages are taken from academic-budget-bert, additional information is available in data/README.md

python shard_data.py \
    --dir <path_to_text_files> \
    -o <output_dir> \
    --num_train_shards 256 \
    --num_test_shards 128 \
    --frac_test 0.1
  • Samples Generation:
python generate_samples.py \
    --dir <path_to_shards> \
    -o <output_path> \
    --dup_factor 10 \
    --seed 42 \
    --do_lower_case 1 \
    --masked_lm_prob 0.15 \ 
    --max_seq_length 128 \
    --model_name bert-base-uncased \
    --max_predictions_per_seq 20 \
    --n_processes 4

Finetuning Dataset

For finetuining the "Large Movie Review" dataset is used, which is freely available HERE

Training

Pretraining (English Wikipedia)

  • Adjust the .json file in experiments/paper_pretrain according to the experiment you want to run.
  • Change data_root to point to the output directory of generate_samples.py
  • change /data/logs to the desired logging directory
  • To train on <num_gpus> on the same machine: python -m torch.distributed.launch --nproc_per_node=<num_gpus> --master_addr="127.0.0.1" --master_port=1234 main.py --exp_name=paper_pretrain/<experiment_name> --seed=6969

When the trining is complete run:

python -m torch.distributed.launch --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=1234 main.py --exp_name=<exp_log_path> --conf_file_path=<log_dir> --mode=test

To compute the pretraining metrics (Accuracy) on the validation set.

Finetuning (ImDB)

  • Adjust the .json file in experiments/paper_finetune according to the experiment you want to run.
  • Change data_root to point to the output directory aclImdb
  • Change pretrain_ck to point to the intended pretrain checkpoint
  • change /data/logs to the desired logging directory
  • To train on <num_gpus> on the same machine: python -m torch.distributed.launch --nproc_per_node=<num_gpus> --master_addr="127.0.0.1" --master_port=1234 main.py --exp_name=paper_finetune/<experiment_name> --seed=6969

Acknowledgments

Reference (Published in Journal of Scientific Computing)

@article{scribano2023dct,
  title={DCT-Former: Efficient Self-Attention with Discrete Cosine Transform},
  author={Scribano, Carmelo and Franchini, Giorgia and Prato, Marco and Bertogna, Marko},
  journal={Journal of Scientific Computing},
  volume={94},
  number={3},
  pages={67},
  year={2023},
  publisher={Springer}
}

About

Public repository for "DCT-Former: Efficient Self-Attention withDiscrete Cosine Transform"


Languages

Language:Python 100.0%