ML Project 2: Human-centerd Commonsense Benchmark

EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.

Baseline Models

We employed T5 based models.

Also, we employed large language models.

OPT66B
GPT3

Human-centered Commonsense Benchmark

We employed 5 different commonsense benchmarks from social interaction to ethical judgment that human could face in every real-life.

Download Datasets from: https://drive.google.com/drive/folders/1eSjhEyg7w4wZJS39ptEimIi-4H2stT7h?usp=sharing

Installation


pip install -r requirement.txt

We tested our python codes on the interactive mode of RunAI @ EPFL cluster. Please look through if you are new user of RunAI.

WANDB dataset/model versioning and loading

This repo is designed to work with wandb for dataset and model versioning, experimental visualization, etc.. Assuming that you have a wandb account you first need to set your WANDB_API_KEY

export WANDB_API_KEY=XXXXXXXXXXXXXXXX

In the code above you can then specify: --wandb_entity, --wandb_project (the target project), --wandb_name (name of experiment), --wandb_data (for automatic loading of data), --wandb_model (for automatic loading of models). In RunAI wandb can be used by adding WANDB_API_KEY to the env variables.

Quickstart

To run the code, simply execute the main bash script:


bash run.sh

For running setup, you can change the configurations below.


DATASET="socialiqa"

TASK="socialiqa"

MODEL_TYPE="opt" <-- select from ["t5", "opt", "bloom", "gpt"]

MODEL_NAME_OR_PATH="facebook/opt-66b" <-- volume directory with model checkpoints (.bin) or hugginface download ('facebook/opt-66b').

TRAIN_BATCH_SIZE=4   <-- training batch size

PREDICT_BATCH_SIZE=1 <-- prediction batch size

N_GPU=8 <-- number of GPUs to use

In-context Learning

To run the code for vinalla In-context Learning, first modify the running command in run.sh:


accelerate launch main.py \
	--do_inference \
	--dataset ${DATASET} \
	--task ${TASK} \
	--model_type ${MODEL_TYPE} \
	--model_name_or_path ${MODEL_NAME_OR_PATH} \
	--predict_batch_size ${PREDICT_BATCH_SIZE} \
	--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \
	--n_gpu ${N_GPU} \
	--max_data 0 \
	--do_icl \			<-- **Add this flag**
	--num_examples 2	<-- **Number of demonstrations used**

Then, execute the script. To use examples pre-selected by the KNN method, modify the running command:


accelerate launch main.py \
	--do_inference \
	--dataset ${DATASET} \
	--task ${TASK} \
	--model_type ${MODEL_TYPE} \
	--model_name_or_path ${MODEL_NAME_OR_PATH} \
	--predict_batch_size ${PREDICT_BATCH_SIZE} \
	--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \
	--n_gpu ${N_GPU} \
	--max_data 0 \
	--do_icl \			
	--num_examples 2	
	--search 			<-- **Add this flag**
	--encoder simcse	<-- **Name of the sentence encoder for embedding**

Then, execute the script.

KNN Example Selection


python dynamic_icl.py \
	--dataset $DATASET_NAME \
	--task $TASK_NAME \
	--encoder_name simcse \ <-- nli_mean or simcse
	--metric cosine \	<-- cosine or euclidean
	--num_neighbors 16

The output file will be under the name $DATA_DIR/$DATASET/train_$ENCODER_NAME.json

About

EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.

MIT License

Languages

Language:Python 98.8%Language:Shell 1.2%