madenindya/newlm

NEWLM

Build Language Model and test it on GLUE dataset.

Setup

We recommend creating a new conda environment before running this exepriment.

pip install -r requirements.txt
wandb login
source .env

Copy .env.template to .env and edit it if necessary.
We use wandb to track our running experiment. This program is still runnable without it.
At the time we run this experiement, we use python=3.8

Run Experiments

We provide several script options to perform LM Pretraining. All script would need to receive config file in .yaml format. Please check the example file accordingly.

Pretrain LM

Run pretrain LM from scratch given a dataset.

python -m newlm run_pretrain --config_file="examples/configs_gcloud/run-100-percent.bert-causal.yaml"

Pretrain LM with given Tokenizer

Run pretrain LM with prebuild tokenizer.

python -m newlm run_pretrain_model --config_file="examples/configs_gcloud/run-100-percent.bert-causal-r2l.yaml"

Finetune Downstream GLUE

Run finetuning on GLUE dataset.

python -m newlm run_glue --config_file="examples/configs_gcloud/run-ft.bert.yaml"

Run All (Pretrain and Downstream)

Run pretrain LM, follow by finetuning with GLUE dataset.

python -m newlm run_all --config_file="examples/configs/run.yaml"

Run Prediction and Save Precomputed Proba

python -m newlm run_glue_predict --config_file="examples/configs/run-predict-ensemble.yaml"

Run Prediction (L2R:R2L) and Ensemble (1:1)

python -m newlm run_predict_ensemble --config_file="examples/configs/run-predict-ensemble.yaml"

Run Ensemble from Saved Precomputed Proba

python -m newlm run_ensemble --config_file="examples/configs/run-predict-ensemble.yaml" --l2r_r2l_ratio=[1,1]

Config .yaml

We use huggingface's transformers library as our base library, so most of the config would follow it. But, we do have some additional config to ease our training process.

Here are some further details and explanation for the config file.

Batch Size (Training)

lm.hf_trainer.total_batch_size

To run with desired total batch. We would automatically calculate the num_device * accum_step * batch_per_device to match the total_batch_size

Resume Training

lm.model.create_params.train_params.resume_from_checkpoint: latest

We prevent a pretrain model to be saved in a non-empty directory. To resume training from the latest checkpoint, set the following config.

Grid Search

We also provide several script for performing grid search. Please adjust accordingly.

Grid search finetune GLUE

./run_grid-bert-all.sh

Grid search ratio of BERT (L2R:R2L)

./run_grid_ensemble.sh

Run Ensemble

TBD

Run Predict

TBD

Run Predict Ensemble

TBD

Available Scripts

./run_predict_glue_test.sh

Run prediction on test dataset
Prepare and create file to be submit to gluebenchmark

Steps

Prepare the dir that contains vocab.txt file.
- Can use the one from pretrained model
- or use one best_model's checkpoint
Prepare your best model & Put it under 1 directory. Expected structure:

best_model_dir
    ` cola
        `checkpoint-123
    ` mrpc
        `checkpoint-456
        `checkpoint-789
    etc

Open script run_predict_glue_test and modify necessary fields
Run script and wait until finish
It would generate submission.zip under output_dir/glue-predict
Submit the file to gluebenchmark web

./run_grid_ensemble_v2.sh

Run Grid Ensemble from pre-predict files
Can be used for both dev/test set

Steps

Create folder with following format:

ensemble_dir
    ` model_type (ex. bert-causal)
        ` 0
            ` glue-predict
                ` cola
                ` mnli
                ` mrpc
                ` ...
        ` 1
            ` glue-predict
     ` model_type-2 (ex. bert-causal-r2l)
        ` 0
        ` 1
# This example would ensemble 4 models

Open examples/configs/run-predict-ensemble-v2.yaml. Modify output_dir: ensemble_dir ONLY!
Run ./run_grid_ensemble_v2.sh 100 ensemble_dir

For Glue Submission (for test set only)

Open script_glue_submission_ens.py
Modify the necessary field
Run: python script_glue_submission_ens.py

Run predict DEV + Run grid ensemble

Run prediction on dev dataset
Run ensemble with multiple weight ratio
Summarize the ensemble

Steps A: Run Prediction to Dev Set

For all the models that you want to ensemble

Prepare the dir that contains vocab.txt file.
- Can use the one from pretrained model
- or use one best_model's checkpoint
Prepare your best model & Put it under 1 directory. Expected structure:

best_model_dir
    ` cola
        `checkpoint-123
    ` mrpc
        `checkpoint-456
        `checkpoint-789
    etc

Open script run_predict_glue_dev.sh and modify necessary fields
Run script and wait until finish

Steps B: Run Ensemble for a combination you want to test

Open run_grid_ensemble_from_prediction.sh
Modify necessary fields
Run ./run_grid_ensemble_from_prediction.sh

Run Elmo V1 / V4

Open example/configs_gcloud/run-ft.elmo-bert-causal-l2r-r2l-v1.yaml
Modify
- output_dir
- tokenizer.pretrained
- lm.pretrained_l2r
- lm.pretrained_r2l

If you want to run grid finetune

Open ./run-grid-elmo-v1.sh
- Modify this line: python summarize_tuning.py outputs/en.100-percent.elmo-bert-causal-v1-finetune
- Change outputs/en.100-percent.elmo-bert-causal-v1-finetune to your output dir
, check ./run-grid-elmo-v1-all.sh

If you want to run just some task / some hyperparams

Run:

CUDA_VISIBLE_DEVICES=_gpu_id_ python -m newlm run_glue \
   --config_file="examples/configs_gcloud/run-ft.elmo-bert-causal-l2r-r2l-v1.yaml" \
   --bs=_batch_size_ \
   --lr=_learning_rate \
   --seed=_seed_ \
   --tasks=[_task_name_]

After finish, can run this for summary:

python summarize_tuning.py _output_dir_ _task_name_

For v4, change all -v1 to -v4

madenindya / newlm