medalpaca 13b outputs "OOO,O,O,O,O,O,O,O,O,O,O,"

Question

medalpaca 13b outputs "OOO,O,O,O,O,O,O,O,O,O,O,"

2533245542 opened this issue a year ago · comments

Hi,

I tried it from huggingface using

from transformers import LlamaTokenizer, AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-13b")
model = AutoModelForCausalLM.from_pretrained("medalpaca/medalpaca-13b", device_map='auto')
input = 'SOAP note is a type of clinical note. please expand on that '
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

then the output is </s>SOAP note is a type of clinical note. please expand on that OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

with the corresponding ids tensor([ 2, 7791, 3301, 4443, 338, 263, 1134, 310, 24899, 936, 4443, 29889, 3113, 7985, 373, 393, 29871, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949], device='cuda:1')

Any clues?

Keno · Answer 1 · Fri Apr 28 2023 21:59:50 GMT+0800 (China Standard Time)

Can you verify the weights are loaded correctly? I also would suggest, you use the inferer, so the prompts are in the right format. During training the model always sees the data like this:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: 
...
### Input: SOAP note is a type of clinical note. please expand on that 

### Response:

I am not sure, if this is the optimal way to prompt the model, but that's how the Standford alpaca was designed.

chingheng113 · Answer 2 · Thu May 04 2023 14:08:34 GMT+0800 (China Standard Time)

Hi,
Thank you so much for this open-source work!
I'm wondering whether we should 'recover' the weights or load them directly into the model.
The model only repeats my prompt or an empty string. Could you provide an example of a model demonstration?

Keno · Answer 3 · Thu May 04 2023 23:32:46 GMT+0800 (China Standard Time)

Sure. This is a screenshot of how you can use the medalpaca Inferer class.

Parth Chokhra · Answer 4 · Tue May 09 2023 11:13:26 GMT+0800 (China Standard Time)

Hey @kbressem, what if we want to pass context along with a question? Also I don't think the 8 bit Medalpacha model on Huggingface is loading properly

2533245542 · Answer 5 · Wed May 10 2023 13:32:43 GMT+0800 (China Standard Time)

Hi, an update on this: I found the 7b version working fine. Both 7b and 13b are loaded from huggingface. I am using two A40 40G GPU with bitsandbytes==0.37.2.

For the prompt template part, since I used the same input for 7b and it worked so it might be a different problem.

Also I attached the code I used to test it here, could you try and see what you get?

from transformers import LlamaTokenizer
from transformers import AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-13b")
model = AutoModelForCausalLM.from_pretrained("medalpaca/medalpaca-13b", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

Keno · Answer 6 · Thu May 11 2023 08:43:58 GMT+0800 (China Standard Time)

@parthplc if you use the 8bit model, the AutoModelForCausalLM will probably not work, as the decapoda-llama has an outdated config file. You need to explicitly use LlamaForCausalLM. If this does not solve your problem, please provide more context what exactly fails.

If you want to pass additional context, you can either adapt the JSON (in case you want to pass the same context multiple times) or pass it to the inferer. Please refer to the docstring of the class. instruction would be your context.

Args:
    input (str):
        The input text to provide to the model.
    instruction (str, optional):
        An optional instruction to guide the model's response.
    output (str, optional): 
        Prepended to the models output, e.g. for 1-shot prompting

Parth Chokhra · Answer 7 · Thu May 11 2023 17:45:54 GMT+0800 (China Standard Time)

Hey @kbressem , I am still getting issue.

from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-lora-7b-8bit")
model = LlamaForCausalLM.from_pretrained("medalpaca/medalpaca-lora-7b-8bit", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

and error is

OSError: Can't load tokenizer for 'medalpaca/medalpaca-lora-7b-8bit'. If you were trying to load it from 
'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make 
sure 'medalpaca/medalpaca-lora-7b-8bit' is the correct path to a directory containing all relevant files for a 
LlamaTokenizer tokenizer.

Keno · Answer 8 · Thu May 11 2023 22:48:45 GMT+0800 (China Standard Time)

the 8-bit model is just the adapters, you still need to load the full model first, then the adapter. Please refer to the above screenshot using the inference class I provide, which does all this for you.