stanford-crfm / BioMedLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluate MedQA_USMLE on a saved model

manusikka opened this issue · comments

Hello,

We followed your steps using deepspeed and were able to create a fine tuned model which was basically created as a checkpoint by the run. We saved this model and then loaded it next time using something like this:
tokenizer = GPT2Tokenizer.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")
model = GPT2LMHeadModel.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")

Now we wanted to run a sample question inference on this model and were using this link
https://huggingface.co/docs/transformers/tasks/multiple_choice#inference

Here is our code:
prompt = ("A 20-year-old woman presents with menorrhagia for the past several years."
"She says that her menses “have always been heavy”, and she has experienced easy bruising for as long as she can remember."
"Family history is significant for her mother, who had similar problems with bruising easily. "
"The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1°C (96.9°F),"
" and blood pressure 110/87 mm Hg. Physical examination is unremarkable. "
" Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds,"
" and PTT 43 seconds. Which of the following is the most likely cause of this patient’s symptoms?")
candidate1 = "Factor V Leiden"
candidate2 = "Hemophilia A"
candidate3 = "Lupus anticoagulant"
candidate4 = "Protein C deficiency"
candidate5 = "Von Willebrand disease"

inputs = tokenizer([[prompt, candidate1], [prompt, candidate2],[prompt, candidate3],[prompt, candidate4],[prompt, candidate5]], return_tensors="pt", padding=True)
labels = torch.tensor(0).unsqueeze(0)

outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
logits = outputs.logits

However we get this error:

ValueError Traceback (most recent call last)
in
2
3 #model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
----> 4 outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
5 logits = outputs.logits

4 frames
/usr/local/lib/python3.9/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848

ValueError: Expected input batch_size (840) to match target batch_size (0).

Do you have a recommendation on how to run a sample question inference on this model ?

We've got it figured out:
instead of GPT2LMHeadModel, we had to use GPT2ForMultipleChoice

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel100")
import sys
sys.path.insert(0, '/content/BioMedLM/finetune')
from utils.custom_modeling_gpt2 import GPT2ForMultipleChoice
model = GPT2ForMultipleChoice.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel100")

We are good now.