unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Home Page:https://unsloth.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: Unsloth: Untrained tokens found, but embed_tokens & lm_head not trainable, causing NaNs. when finetuning llama3 on customed dataset

liwd190019 opened this issue · comments

I want to implement llama 3 on multi-turn dialogue task, so I was trying to finetune it on one of my customed dataset, which is made by simply extracting all the dialogue contents from soda and reformat it to be in llama3's chat template.

I think the result of this dataset is quite similar to the original. But when I tried to train the model, I got the following error:

ValueError: Unsloth: Untrained tokens found, but embed_tokens & lm_head not trainable, causing NaNs. Restart then add `embed_tokens` & `lm_head` to `FastLanguageModel.get_peft_model(target_modules = [..., "embed_tokens", "lm_head",])`

As it indicates, it's because there're some untrainable tokens. Though this bug can be fixed by following the hints, I just can't figure out why I introduced those untrainable tokens. After all, the two datasets (my customed one and the default one in the notebook) look very similar.

Here is a link to the colab, feel free to comment and give advice!

Also can anyone suggest a good inference method to test the finetuned multi-turn chatbot? Currently, the inference code in the notebook for chatbot only supports 1-turn.

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"from": "human", "value": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

I started to get this error today too, strange...

Try using the Instruct model ("unsloth/llama-3-8b-Instruct-bnb-4bit").

@sumukshashidhar @liwd190019 Apologies! As @dmitrii-palisaderesearch mentioned, please use the instruct version - using the base version will error out, since Unsloth does auto checking if some tokens are all 0s - if you still want to use the base model, either do not use the llama-3 chat template (just use Alpaca), or train on lm_head and embed_tokens

@danielhanchen Do you know why it started to throw this error when it was successfully training with the base model a couple of months ago?

I added a check in Unsloth to check if your embeddings are untrained - I might have to change the logic actually