PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support llama-3

boixu opened this issue · comments

Hi

Please add support for llama-3

Currently the prompt template is not compatible since llama-3 uses different style.
Ref: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3

Currently as is I was unable to use the llama-3 model.

Thanks in advance!

h i tried llama-3 and may be you can use the setup.
code is little dirty.

first add template for llama3 in file.
prompt_template_utils.py



def get_prompt_template(system_prompt=system_prompt, promptTemplate_type=None, history=False):
    if promptTemplate_type == "llama3":
        if history:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {history} \n {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["history", "context", "question"],
        )
        else:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["context", "question"],
        )
    elif promptTemplate_type == "llama":
        B_INST, E_INST = "[INST]", "[/INST]"
        B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
        SYSTEM_PROMPT = B_SYS + system_prompt + E_SYS

then add option for choosing the llama3 in localGPT
run_localGPT.py

@click.option(
    "--model_type",
    default="llama",
    type=click.Choice(
        ["llama", "mistral", "non_llama", "llama3"],
    ),
    help="model type, llama, mistral or non_llama, or llama3",
)

you can run now with python run_localGPT.py --model_type llama3

here is the model i used for tesitng.

constants.py

# LLAMA 3
MODEL_ID = "unsloth/llama-3-8b-bnb-4bit"
MODEL_BASENAME = None

@toomy0toons did you upgrade the llama cpp or transformers version to make this work with llama-3?

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version,
and i installed yesterday my verions might be a more recent one.
my transformers is transformers==4.38.2 now.
@KerenK-EXRM
is there a problem running llama3?

I think since llama2 is probably not going to be used anymore, I will update the prompt template for llama3 as default template.

@toomy0toons I tried with another version( QuantFactory/Meta-Llama-3-8B-GGUF) and it did't work.
looks like the project adjusted to support llama3
thank you! cant wait to try :)

hi i have downloaded llama3 70b model . can some one provide me steps to convert into hugging face model and then run in the localGPT as currently i have done the same for llama 70b i am able to perform but i am not able to convert the full model files to .hf format files. so i would request for an proper steps in how i can perform. please let me know guys any steps please let me know. thank you

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version, and i installed yesterday my verions might be a more recent one. my transformers is transformers==4.38.2 now. @KerenK-EXRM is there a problem running llama3?

Hi @toomy0toons , trying to do the same but having some issues as per this #793

@carloposo
@KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

@carloposo @KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

No 7B models for llama3 (https://adithyask.medium.com/from-7b-to-8b-parameters-understanding-weight-matrix-changes-in-llama-transformer-models-31ea7ed5fd88)

Do you mean none of the embedding models in constants.py are ok to run any of the llama-3 8b models?