Fine-tune multiple large language models in low-memory environments.
This repository provides wrappers around LLMs for
- 8-bit quantization of pre-trained model weights, and
- fine-tuning with LoRA adapters.
(*Logo generated by stablediffusion)
Prerequisite: PyTorch with CUDA support (11.3 recommended, but will work
with other versions up to 11.7). If using conda, use conda install -c conda-forge cudatoolkit=11.7
.
Install with
make install
If you run into issues, see the troubleshooting guide.
import transformers
import finetuna as ft
import bitsandbytes as bnb
model_name = 'facebook/opt-125m'
base_model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
# If memory constraints require it, you can manually pre-quantize a model:
ft.prepare_base_model(base_model)
# Create new finetuned models using either the base or quantized model
model_1 = ft.new_finetuned(base_model)
# Can specify granular parameters, if required
model_2 = ft.new_finetuned(
base_model,
adapt_layers = {'embed_tokens', 'embed_posotions', 'q_proj', 'v_proj'},
embedding_config=ft.EmbeddingAdapterConfig(r=4, alpha=1),
linear_config={
'q_proj': ft.LinearAdapterConfig(r=8, alpha=1, dropout=0.0, bias=False),
'v_proj': ft.LinearAdapterConfig(r=4, alpha=1, dropout=0.1, bias=True),
},
)
# Be sure to use the bitsandbytes optimisers
opt = bnb.optim.AdamW(model_1.parameters())
# Fine-tune as usual
with t.cuda.amp.autocast():
opt.zero_grad()
loss = mse_loss(model_1(prompt) - target) # pseudo-notation
model_1.backward(loss)
opt.step()
# NOTE: saving not yet implemented:
# Either save complete state like a normal pytorch model
t.save(model_1.state_dict(), "/save/path.pt")
# Or save only the changed state to reload from base model
t.save(ft.state_dict(model_1), "/save/path_finetuned.pt")
# Load:
model_2.load_state_dict("/save/path.pt")
# Load only adapter state with strict=False
model_2.load_state_dict("/save/path_finetuned.pt", strict=False)
Please see the usage guide in the documentation for usage instructions.