Kirouane-Ayoub / qlora_tunner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Name: LLMs QLORA Fine-Tuner

Description:

1_e2xnfI4zDhih3U8bOBNacg

QLORA Fine-Tuner is a Python library designed for efficient fine-tuning of Large Language Models (LLMs) using quantized low-rank adapters. It reduces the number of trainable parameters and GPU memory requirements, making fine-tuning accessible for a wide range of applications.

Qlora :

QLoRA, or Quantized Low-Rank Adapters, is a new approach to fine-tuning large language models (LLMs) that uses less memory while maintaining speed. It was developed by researchers at the University of Washington and released in May 2023.

QLoRA works by first quantizing the LLM to 4-bits, which reduces the model's memory footprint significantly. The quantized LLM is then fine-tuned using a technique called Low-Rank Adapters (LoRA). LoRA enables the refined model to preserve the majority of the accuracy of the original LLM while being significantly smaller and quicker.

Key Features:

  • Quantized Low-Rank Adapters: Injected into each layer of the LLM for efficient fine-tuning.
  • Reduced Memory Footprint: Use of quantization techniques to save GPU memory.
  • Easy Integration: Seamless integration with popular LLMs and Hugging Face Transformers.
  • Versatile Applications: Suitable for various natural language processing tasks, including text generation,and more.
  • Open-Source: Available under an open-source license, allowing for community contributions and collaboration.

The Default HyperParameters

  • lora_r = 64
  • lora_alpha = 16
  • lora_dropout = 0.1
  • use_4bit = True
  • bnb_4bit_compute_dtype = "float16"
  • bnb_4bit_quant_type = "nf4"
  • use_nested_quant = False
  • fp16 = False
  • bf16 = False
  • per_device_train_batch_size = 4
  • per_device_eval_batch_size = 4
  • gradient_accumulation_steps = 1
  • gradient_checkpointing = True
  • max_grad_norm = 0.3
  • learning_rate = 2e-4
  • weight_decay = 0.001
  • optim = "paged_adamw_32bit"
  • lr_scheduler_type = "constant"
  • max_steps = -1
  • warmup_ratio = 0.03
  • group_by_length = True
  • save_steps = 25
  • logging_steps = 5

Usage :

Installation :

pip -q  install qlora-tunner==1.0

LLAMA dataset Reformer :

Supported Dataset Format

{
    'input' : 'Model input' ,
    'output' : 'Model output' 
}
from qlora_tunner.utils import data_reformer
train_dataset_path = "train.jsonl" # jsonl dataset format 
valid_dataset_path = "valid.jsonl"

system_message = "Instruction or system_message "

train_dataset_mapped = data_reformer(inp_ut="prompt" ,
                                     output="response" , 
                                     dataset_path=train_dataset_path, 
                                     system_message=system_message)

valid_dataset_mapped = data_reformer(inp_ut="prompt" ,
                                     output="response" ,
                                     dataset_path=valid_dataset_path , 
                                     system_message= system_message)

Fine-Tuning

from qlora_tunner.qlora_fine_tuner import LanguageModelFineTuner
model_name = "Your Model Id " # Example : NousResearch/llama-2-7b-chat-hf
fine_tuner = LanguageModelFineTuner(model_name)
fine_tuner.train(train_dataset_mapped=train_dataset_mapped,
                 valid_dataset_mapped=valid_dataset_mapped , 
                 output_dir="LLAMA2_chat" ,
                 num_train_epochs=1)

Create Inference pipeline

from qlora_tunner.qlora_inference import LanguageModelInference

model_name = "Your Model Id " # Example : NousResearch/llama-2-7b-chat-hf
LoraConfig_folder = "LoraConfig_file"

inference = LanguageModelInference(model_name)
pipeline = inference.inf_pipeline(max_length=2048 , LoraConfig_file=LoraConfig_folder)

Run the Inference

system_message = "system_message or Instruction"
input_text = "Your input Text"
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n {input_text}. [/INST]"

result = pipeline(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

With Langchain (chatBot Example)

pip -q install langchain
from langchain.llms import HuggingFacePipeline
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

local_llm = HuggingFacePipeline(pipeline=pipeline)
memory = ConversationBufferWindowMemory(k=3)

chat = ConversationChain(
    llm=local_llm,
    verbose=False ,
    memory=memory
)
chat.prompt.template = \
"""
### HUMAN:
Write here Your system_message
Previous Conversation :
{history}

Current conversation:
### HUMAN: {input}
### RESPONSE:"""

while 1 :
  input_text = input(">>")
  print(chat.predict(input=str(input_text)))
  • Author : Kirouane Ayoub

About


Languages

Language:Jupyter Notebook 55.8%Language:Python 44.2%