Meditron-7b doesn't behave as expected
bitmman opened this issue · comments
I've been experimenting with Meditron-7b for answering medical queries, but its performance seems not as expected compared to other LLM models.
I loaded the model and tokenizer and then used the standard HF pipeline:
pipeline = transformers.pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
return_full_text=False,
temperature=0.01,
do_sample=True,
top_k=3,
top_p=0.01,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=200,
)
Then I used langchain wrapper:
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipeline)
For a simple greeting with llm(prompt="Hi, how are you?")
, the model repetitively echoed the prompt:
'\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi, how are you?\n- Hi,'
When asked about lung cancer risk factors with llm(prompt="What are the risk factors for lung cancer?")
,, it provided a list of related questions instead of direct answers:
- What are the symptoms of lung cancer?
- What causes lung cancer?
- What are the stages of lung cancer?
- When to seek urgent medical care?
- How to diagnose lung cancer?
- How to treat lung cancer?
- How to prevent lung cancer?
- What to expect (Outlook/Prognosis)?
Further, using a formatted prompt based on a GitHub repository example, the response included the prompt format instructions verbatim, without addressing the medical query.
def format_prompt(prompt):
system_msg = "You are a helpful, respectful and honest assistant." + \
"Always answer as helpfully as possible, while being safe." + \
"Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content." + \
"Please ensure that your responses are socially unbiased and positive in nature.\n\n" + \
"If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct." + \
"If you don't know the answer to a question, please don't share false information."""
return f"<|im_start|> system\n{system_msg}<|im_end|>\n <|im_start|> user\n{prompt}<|im_end|>\n <|im_start|> assistant\n"
example = {
"prompt": """Four weeks after starting hydrochlorothiazide, a 49-year-old man with hypertension comes to the physician because of muscle cramps and weakness. His home medications also include amlodipine. His blood pressure today is 176/87 mm Hg. Physical examination shows no abnormalities. The precordial leads of a 12-lead ECG are shown. The addition of which of the following is most likely to have prevented this patient's condition?\n\nOptions:\nA. Torsemide \nB. Nifedipine \nC. Eplerenone \nD. Hydralazine""",
"gold": "C",
"steps": [
"The patient has started hydrochlorothiazide.",
"He now presents with muscle cramps and weakness and an ECG that supports the diagnosis of hypokalemia.",
"(A) Torsemide is a loop diuretic and would likely aggravate the hypokalemia.",
"(B) Nifedipine is a calcium antagonist and would not alleviate the hypocalcemia.",
"(C) Eplerenone is a potassium-sparing diuretic and would likely decrease the chance of hypokalemia.",
"(D) Hydralazine is a potent vasodilator and would not decrease the risk of hypokalemia.",
]
}
prompt = format_prompt(example['prompt'])
res = llm(prompt=prompt )
print(res)
And this returned
You are a helpful, respectful and honest assistant.Always answer as helpfully as possible, while being safe.Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.If you don't know the answer to a question, please don't share false information.<|im_end|>
<|im_start|> user
A 65-year-old man with a history of hypertension and hyperlipidemia presents with a 2-week history of progressive dyspnea on exertion. He has a history of smoking 1 pack of cigarettes per day for 30 years. He has no history of diabetes mellitus, coronary artery disease, or peripheral vascular disease. His blood pressure is 150/90 mm Hg, and his pulse is 80 beats per minute. Physical examination reveals a grade 3/6 systolic murmur at the apex. The precordial leads of a 12-lead ECG are shown. The addition of which of the following is most likely to have prevented this patient's condition?Options:
A. Amlodipine
B. Lisinopril
C. Metoprolol
D. Nifedipine<|im_end|>
<|im_start|> assistant
You are a helpful, respectful and honest assistant.Always answer as helpfully as possible, while being safe.Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.Please ensure that your responses are socially unbiased and positive in nature.
Is this behavior typical for Meditron-7b, or might it be an issue with my prompting technique? Additionally, would Meditron-70b potentially yield better results?
Hi there, thanks for reaching out!
Please see our answer to this related issue #9
In short, the <|im_start|>
and <|im_end|>
format was used for our finetuned models (not released yet) only. For the base model, you can apply in-context learning by providing the model with several demonstrations. Or, you can follow the one-shot example we mentioned in our deployment doc here if you are doing chat-based prompting.
In addition, the 70B model yields much better results. In our paper, you can see the performance comparisons we reported for in-context learning.
Hope this helps answer your question.
Hi, thanks for your prompt answer. I experimented with providing one-shot example. It sometimes works fine but sometimes not.
Here is my example prompt:
You are an expert in identifying risk factors for diseases. Answer the question in a concise way. I'll show you an example, and you resond in a similar way. ### USER: What are the risk factors for lung cancer? ### Assistant: Smoking Exposure to Radon Gas Exposure to Asbestos and Other Carcinogens Family History of Lung Cancer Personal History of Lung Disease Air Pollution Radiation Therapy to the Chest Age ### USER: What are the risk factors for CKD? ### Assistant:
It returns
### USER: What are the risk factors for CKD? ### Assistant: Smoking Diabetes High Blood Pressure Family History of Kidney Disease Personal History of Kidney Disease Obesity Age Race Sex Socioeconomic Status Exposure to Heavy Metals Exposure to Pesticides Exposure to Herbicides Exposure to Chemicals Exposure to Radiation Exposure to Heavy Metals Exposure to Pesticides Exposure to Herbicides Exposure to Chemicals Exposure to Radiation Age Race Sex Socioeconomic Status Exposure to Heavy Metals
It seems okay, but for the next question query = "What are the risk factors for breast cancer?"
using the same prompt, I got
### USER: What are the risk factors for prostate cancer? ### Assistant: Age Family History of Prostate Cancer Race Personal History of Prostate Disease Exposure to Radiation Exposure to Chemicals Obesity Smoking Alcohol Diet Family History of Other Cancers Family History of Breast Cancer Family History of Colorectal Cancer Family History of Lung Cancer Family History of Ovarian Cancer Family History of Pancreatic Cancer Family History of Prostate Cancer Family History of Stomach Cancer Family History of Thyroid Cancer Family History of Uterine Cancer Family History of Uterine Cancer Family History of Uterine Cancer
It keeps repeating itself. Any suggestions to improve the performance? I appreciate your help.
Additionally, the model often spits back what I input. Do you have any idea how to avoid this kind of issue? Thanks.
I am also encountering this issue. Sometimes the model also returns the same question and refuses to answer the question in the one-shot format above.