Evaluate MedQA_USMLE on a saved model
manusikka opened this issue · comments
Hello,
We followed your steps using deepspeed and were able to create a fine tuned model which was basically created as a checkpoint by the run. We saved this model and then loaded it next time using something like this:
tokenizer = GPT2Tokenizer.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")
model = GPT2LMHeadModel.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")
Now we wanted to run a sample question inference on this model and were using this link
https://huggingface.co/docs/transformers/tasks/multiple_choice#inference
Here is our code:
prompt = ("A 20-year-old woman presents with menorrhagia for the past several years."
"She says that her menses “have always been heavy”, and she has experienced easy bruising for as long as she can remember."
"Family history is significant for her mother, who had similar problems with bruising easily. "
"The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1°C (96.9°F),"
" and blood pressure 110/87 mm Hg. Physical examination is unremarkable. "
" Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds,"
" and PTT 43 seconds. Which of the following is the most likely cause of this patient’s symptoms?")
candidate1 = "Factor V Leiden"
candidate2 = "Hemophilia A"
candidate3 = "Lupus anticoagulant"
candidate4 = "Protein C deficiency"
candidate5 = "Von Willebrand disease"
inputs = tokenizer([[prompt, candidate1], [prompt, candidate2],[prompt, candidate3],[prompt, candidate4],[prompt, candidate5]], return_tensors="pt", padding=True)
labels = torch.tensor(0).unsqueeze(0)
outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
logits = outputs.logits
However we get this error:
ValueError Traceback (most recent call last)
in
2
3 #model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
----> 4 outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
5 logits = outputs.logits
4 frames
/usr/local/lib/python3.9/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848
ValueError: Expected input batch_size (840) to match target batch_size (0).
Do you have a recommendation on how to run a sample question inference on this model ?
We've got it figured out:
instead of GPT2LMHeadModel, we had to use GPT2ForMultipleChoice
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel100")
import sys
sys.path.insert(0, '/content/BioMedLM/finetune')
from utils.custom_modeling_gpt2 import GPT2ForMultipleChoice
model = GPT2ForMultipleChoice.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel100")
We are good now.