Support llama-3

Question

Support llama-3

boixu opened this issue 2 months ago · comments

Hi

Please add support for llama-3

Currently the prompt template is not compatible since llama-3 uses different style.
Ref: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3

Currently as is I was unable to use the llama-3 model.

Thanks in advance!

toomy0toons · Answer 1 · Thu May 02 2024 15:31:11 GMT+0800 (China Standard Time)

h i tried llama-3 and may be you can use the setup.
code is little dirty.

first add template for llama3 in file.
prompt_template_utils.py



def get_prompt_template(system_prompt=system_prompt, promptTemplate_type=None, history=False):
    if promptTemplate_type == "llama3":
        if history:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {history} \n {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["history", "context", "question"],
        )
        else:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["context", "question"],
        )
    elif promptTemplate_type == "llama":
        B_INST, E_INST = "[INST]", "[/INST]"
        B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
        SYSTEM_PROMPT = B_SYS + system_prompt + E_SYS

then add option for choosing the llama3 in localGPT
run_localGPT.py

@click.option(
    "--model_type",
    default="llama",
    type=click.Choice(
        ["llama", "mistral", "non_llama", "llama3"],
    ),
    help="model type, llama, mistral or non_llama, or llama3",
)

you can run now with python run_localGPT.py --model_type llama3

here is the model i used for tesitng.

constants.py

# LLAMA 3
MODEL_ID = "unsloth/llama-3-8b-bnb-4bit"
MODEL_BASENAME = None

Kerenk · Answer 2 · Fri May 03 2024 04:25:47 GMT+0800 (China Standard Time)

@toomy0toons did you upgrade the llama cpp or transformers version to make this work with llama-3?

toomy0toons · Answer 3 · Fri May 03 2024 08:27:37 GMT+0800 (China Standard Time)

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version,
and i installed yesterday my verions might be a more recent one.
my transformers is transformers==4.38.2 now.
@KerenK-EXRM
is there a problem running llama3?

PromptEngineer · Answer 4 · Fri May 03 2024 13:58:57 GMT+0800 (China Standard Time)

I think since llama2 is probably not going to be used anymore, I will update the prompt template for llama3 as default template.

Kerenk · Answer 5 · Sun May 05 2024 02:10:41 GMT+0800 (China Standard Time)

@toomy0toons I tried with another version( QuantFactory/Meta-Llama-3-8B-GGUF) and it did't work.
looks like the project adjusted to support llama3
thank you! cant wait to try :)

V S VISWANATH · Answer 6 · Wed May 08 2024 15:37:16 GMT+0800 (China Standard Time)

hi i have downloaded llama3 70b model . can some one provide me steps to convert into hugging face model and then run in the localGPT as currently i have done the same for llama 70b i am able to perform but i am not able to convert the full model files to .hf format files. so i would request for an proper steps in how i can perform. please let me know guys any steps please let me know. thank you

carloposo · Answer 7 · Wed May 08 2024 16:44:53 GMT+0800 (China Standard Time)

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.
# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version, and i installed yesterday my verions might be a more recent one. my transformers is transformers==4.38.2 now. @KerenK-EXRM is there a problem running llama3?

Hi @toomy0toons , trying to do the same but having some issues as per this #793

toomy0toons · Answer 8 · Wed May 08 2024 18:32:17 GMT+0800 (China Standard Time)

@carloposo
@KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

carloposo · Answer 9 · Wed May 08 2024 18:46:48 GMT+0800 (China Standard Time)

@carloposo @KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

No 7B models for llama3 (https://adithyask.medium.com/from-7b-to-8b-parameters-understanding-weight-matrix-changes-in-llama-transformer-models-31ea7ed5fd88)

Do you mean none of the embedding models in constants.py are ok to run any of the llama-3 8b models?

carloposo · Answer 10 · Thu May 09 2024 17:59:51 GMT+0800 (China Standard Time)

@toomy0toons found out the answer here https://youtu.be/S6PdFPoteBU?si=pSsxCNFJsz_dxn8b&t=551