MCQ-Robustness

Repo for paper

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

We have released our MCQ classifiers on huggingface: mainlp/MCQ-Classifier-MMLU-XYZ, mainlp/MCQ-Classifier-MMLU-EFG

Please refer to their model cards for the details.

How to use

Your should construct your input into such format: model_reponse + "\nReferences:" + references + "\nAnswer:"

For example:

inputs = " Sure, I'm happy to help! The correct answer is:\n\nB. retraction of the stoma. \nReferences: \nA. high output stomas. \nB. retraction of the stoma. \nC. prolapsed stomas. \nD. herniation around the stoma. \nAnswer:"

then feed it to the classifier:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained("mainlp/MCQ-Classifier-MMLU-XYZ")
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(base_model, "mainlp/MCQ-Classifier-MMLU-XYZ")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
to_classify = f"""<s>[INST] Classify the response.{inputs} [/INST]"""
model_input = tokenizer(to_classify, return_tensors="pt")
output = model.generate(**model_input, max_new_tokens=1, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Cite

@article{wang2024look,
  title={Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think},
  author={Wang, Xinpeng and Hu, Chengzhi and Ma, Bolei and R{\"o}ttger, Paul and Plank, Barbara},
  journal={arXiv preprint arXiv:2404.08382},
  year={2024}
}

mainlp / MCQ-Robustness

MCQ-Robustness

How to use

Cite

About