OhadRubin/lightning-transformers

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

What is Lightning Transformers • Using Lightning Transformers • Docs • Community • License

Installation

Option 1: from PyPI

pip install lightning-transformers
# instead of: `python train.py ...`, run with:
pl-transformers-train ...

Option 2: from source

git clone https://github.com/PyTorchLightning/lightning-transformers.git
cd lightning-transformers
pip install .
python train.py ...
# the `pl-transformers-train` endpoint is also available!

What is Lightning-Transformers

Lightning Transformers offers a flexible interface for training and fine-tuning SOTA Transformer models using the PyTorch Lightning Trainer.

Train using HuggingFace Transformers models and datasets with Lightning custom Callbacks, Loggers, Accelerators and high performance scaling.
Seamless Memory and Speed Optimizations such as DeepSpeed ZeRO or FairScale Sharded Training with no code changes.
Powerful config composition backed by Hydra - Easily swap out models, optimizers, schedulers and many more configurations without touching the code.
Transformer Task Abstraction for Rapid Research & Experimentation - Built from the ground up to be task agnostic, the library supports creating transformer tasks across all modalities with little friction.

Lightning Transformers tasks allow you to train models using HuggingFace Transformer models and datasets, use Hydra to hotswap models, optimizers or schedulers and leverage all the advances features that Lightning has to offer, including custom Callbacks, Loggers, Accelerators and high performance scaling with minimal changes.

Using Lightning-Transformers

Grid is our platform for training models at scale on the cloud! Sign up here.

Task	Quick Commands	Run
Language Modeling	`python train.py task=nlp/language_modeling dataset=nlp/language_modeling/wikitext trainer.gpus=1 training.batch_size=8`
Multiple Choice	`python train.py task=nlp/multiple_choice dataset=nlp/multiple_choice/race trainer.gpus=1`
Question Answering	`python train.py task=nlp/question_answering dataset=nlp/question_answering/squad trainer.gpus=1`
Summarization	`python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1`
Text Classification	`python train.py task=nlp/text_classification dataset=nlp/text_classification/emotion trainer.gpus=1`
Token Classification	`python train.py task=nlp/token_classification dataset=nlp/token_classification/conll trainer.gpus=1`
Translation	`python train.py task=nlp/translation dataset=nlp/translation/wmt16 trainer.gpus=1`

Quick recipes

Train bert-base-cased on the CARER emotion dataset using the Text Classification task.

python train.py \
    task=nlp/text_classification \
    dataset=nlp/text_classification/emotion

See the composed Hydra config used under-the-hood

optimizer:
  _target_: torch.optim.AdamW
  lr: ${training.lr}
  weight_decay: 0.001
scheduler:
  _target_: transformers.get_linear_schedule_with_warmup
  num_training_steps: -1
  num_warmup_steps: 0.1
training:
  run_test_after_fit: true
  lr: 5.0e-05
  output_dir: .
  batch_size: 16
  num_workers: 16
trainer:
  _target_: pytorch_lightning.Trainer
  logger: true
  checkpoint_callback: true
  callbacks: null
  default_root_dir: null
  gradient_clip_val: 0.0
  process_position: 0
  num_nodes: 1
  num_processes: 1
  gpus: null
  auto_select_gpus: false
  tpu_cores: null
  log_gpu_memory: null
  progress_bar_refresh_rate: 1
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: 1
  max_steps: null
  min_steps: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  val_check_interval: 1.0
  flush_logs_every_n_steps: 100
  log_every_n_steps: 50
  accelerator: null
  sync_batchnorm: false
  precision: 32
  weights_summary: top
  weights_save_path: null
  num_sanity_val_steps: 2
  resume_from_checkpoint: null
  profiler: null
  benchmark: false
  deterministic: false
  auto_lr_find: false
  replace_sampler_ddp: true
  terminate_on_nan: false
  auto_scale_batch_size: false
  prepare_data_per_node: true
  plugins: null
  amp_backend: native
  amp_level: O2
  move_metrics_to_cpu: false
task:
  _recursive_: false
  backbone: ${backbone}
  optimizer: ${optimizer}
  scheduler: ${scheduler}
  _target_: lightning_transformers.task.nlp..text_classification.TextClassificationTransformer
  downstream_model_type: transformers.AutoModelForSequenceClassification
dataset:
  cfg:
    batch_size: ${training.batch_size}
    num_workers: ${training.num_workers}
    dataset_name: emotion
    dataset_config_name: null
    train_file: null
    validation_file: null
    test_file: null
    train_val_split: null
    max_samples: null
    cache_dir: null
    padding: max_length
    truncation: only_first
    preprocessing_num_workers: 1
    load_from_cache_file: true
    max_length: 128
    limit_train_samples: null
    limit_val_samples: null
    limit_test_samples: null
  _target_: lightning_transformers.task.nlp.text_classification.TextClassificationDataModule
experiment_name: ${now:%Y-%m-%d}_${now:%H-%M-%S}
log: false
ignore_warnings: true
tokenizer:
  _target_: transformers.AutoTokenizer.from_pretrained
  pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
  use_fast: true
backbone:
  pretrained_model_name_or_path: bert-base-cased

Swap the backbone to RoBERTa and the optimizer to RMSprop:

python train.py \
    task=nlp/text_classification \
    dataset=nlp/text_classification/emotion
    backbone.pretrained_model_name_or_path=roberta-base
    optimizer=rmsprop

See the changed Hydra config under-the-hood

 optimizer:
-  _target_: torch.optim.AdamW
+  _target_: torch.optim.RMSprop
   lr: ${training.lr}
-  weight_decay: 0.001
 scheduler:
   _target_: transformers.get_linear_schedule_with_warmup
   num_training_steps: -1
....
tokenizer:
   pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
   use_fast: true
 backbone:
-  pretrained_model_name_or_path: bert-base-cased
+  pretrained_model_name_or_path: roberta-base

Enable Sharded Training.

python train.py \
    task=nlp/text_classification \
    dataset=nlp/text_classification/emotion \
    trainer=sharded

See the changed Hydra config under-the-hood

Without the need to modify any code, the config updated automatically for sharded training:

optimizer:
   _target_: torch.optim.AdamW
   lr: ${training.lr}
trainer:
   process_position: 0
   num_nodes: 1
   num_processes: 1
-  gpus: null
+  gpus: 1
   auto_select_gpus: false
   tpu_cores: null
   log_gpu_memory: null
   ...
   log_every_n_steps: 50
-  accelerator: null
+  accelerator: ddp
   sync_batchnorm: false
-  precision: 32
+  precision: 16
   weights_summary: top
   ....
   terminate_on_nan: false
   auto_scale_batch_size: false
   prepare_data_per_node: true
-  plugins: null
+  plugins:
+    _target_: pytorch_lightning.plugins.DDPShardedPlugin
   amp_backend: native
   amp_level: O2
   move_metrics_to_cpu: false
tokenizer:
   pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
   use_fast: true
 backbone:
   pretrained_model_name_or_path: bert-base-cased

Enable DeepSpeed ZeRO Training.

python train.py \
    task=nlp/text_classification \
    dataset=nlp/text_classification/emotion \
    trainer=deepspeed

See the changed Hydra config under-the-hood

Without the need to modify any code, the config updated automatically for DeepSpeed:

optimizer:
   _target_: torch.optim.AdamW
   lr: ${training.lr}
trainer:
   process_position: 0
   num_nodes: 1
   num_processes: 1
-  gpus: null
+  gpus: 1
   auto_select_gpus: false
   tpu_cores: null
   log_gpu_memory: null
   ...
   val_check_interval: 1.0
   flush_logs_every_n_steps: 100
   log_every_n_steps: 50
-  accelerator: null
+  accelerator: ddp
   sync_batchnorm: false
-  precision: 32
+  precision: 16
   ...
-  plugins: null
+  plugins:
+    _target_: pytorch_lightning.plugins.DeepSpeedPlugin
+    stage: 2
+    cpu_offload: true
   amp_backend: native
   amp_level: O2
   move_metrics_to_cpu: false
...

Train with a pre-trained t5-base backbone, on the XSUM dataset using the Summarization task.

python train.py \
    task=nlp/summarization \
    dataset=nlp/summarization/xsum \
    backbone.pretrained_model_name_or_path=t5-base

Train with a pre-trained mt5-base backbone, on the WMT16 dataset using the Translation task with 2 GPUs.

python train.py \
    task=nlp/translation \
    dataset=nlp/translation/wmt16 \
    backbone.pretrained_model_name_or_path=google/mt5-base \
    trainer.gpus=2

Custom Files & Datasets

You can train, validate and test Lightning transformers tasks on your own data files, and you can extend datasets for custom processing and your own tasks.

How to train, validate and test on custom files

How to extend datasets

Custom Tasks

Extending the Language Modeling Task

Contribute

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Community

For help or questions, join our huge community on Slack!

License

Please observe the Apache 2.0 license that is listed in this repository. In addition, the Lightning framework is Patent Pending.

OhadRubin / lightning-transformers

Installation

Option 1: from PyPI

Option 2: from source

What is Lightning-Transformers

Using Lightning-Transformers

Quick recipes

Train bert-base-cased on the CARER emotion dataset using the Text Classification task.

Swap the backbone to RoBERTa and the optimizer to RMSprop:

Enable Sharded Training.

Enable DeepSpeed ZeRO Training.

Train with a pre-trained t5-base backbone, on the XSUM dataset using the Summarization task.

Train with a pre-trained mt5-base backbone, on the WMT16 dataset using the Translation task with 2 GPUs.

Custom Files & Datasets

How to train, validate and test on custom files

How to extend datasets

Custom Tasks

Extending the Language Modeling Task

Contribute

Community

License

About

Languages