ITREX need to do modification for llama3 new prompt format

Question

ITREX need to do modification for llama3 new prompt format

redhairerINTEL opened this issue 5 months ago · comments

redhairer@intel commented 5 months ago

New prompt format for llama3
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

Kevin Ta · Answer 1 · Wed Apr 24 2024 00:53:26 GMT+0800 (China Standard Time)

@kevinintel

Dong, Bo · Answer 2 · Thu Apr 25 2024 17:03:45 GMT+0800 (China Standard Time)

here is the sample code if you want to use llama3 template:
all you need is to apply template to input_ids.

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig

model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)


outputs = model.generate(input_ids , streamer=streamer)

We will also add it to doc soon.

1\13121)11_11\/1 · Answer 3 · Tue Apr 30 2024 14:13:17 GMT+0800 (China Standard Time)

here is the sample code if you want to use llama3 template: all you need is to apply template to input_ids.

from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig

model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)


outputs = model.generate(input_ids , streamer=streamer)

We will also add it to doc soon.

This gives me AssertionError: Fail to convert pytorch model