Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1

Question

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1

davideuler opened this issue 6 months ago · comments

Could you help to give an example code to run GPTFast on Mixtral-8x7B-Instruct-v0.1?

I load the model with GPTFast with empty draft_model_name. Error shows when loading the model as following.

model_name = "./Mixtral-8x7B-v0.1"
draft_model_name = ""

tokenizer = AutoTokenizer.from_pretrained(model_name)
initial_string = "Write me a short story."
input_tokens = tokenizer.encode(initial_string, return_tensors="pt").to(device)

# ....

Traceback (most recent call last):
File "/data/gptfast.py", line 77, in
gpt_fast_model = gpt_fast(model_name, sample_function=argmax, max_length=60, cache_config=cache_config, draft_model_name=draft_model_name)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/GPTFast.py", line 11, in gpt_fast
model = add_kv_cache(model, sample_function, max_length, cache_config, dtype=torch.float16)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 208, in add_kv_cache
model = KVCacheModel(transformer, sampling_fn, max_length, cache_config, dtype)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 21, in init
self._model = self.add_static_cache_to_model(model, cache_config, max_length, dtype, self.device)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 48, in add_static_cache_to_model
module_forward_str_kv_cache = add_input_pos_to_func_str(module_forward_str, forward_prop_ref, "input_pos=input_pos")
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Helpers/String/add_input_pos_to_func_str.py", line 18, in add_input_pos_to_func_str
raise ValueError("Submodule forward pass not found.")
ValueError: Submodule forward pass not found.

MDK8888 · Answer 1 · Thu Apr 11 2024 07:17:30 GMT+0800 (China Standard Time)

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

david l euler · Answer 2 · Thu Apr 11 2024 10:44:08 GMT+0800 (China Standard Time)

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

Thanks, looking forward the new branch.