Scripts to finetune GPTBigCode architecture models

This repo provides the whole pizza for fine-tuning GPTBigCode models (e.g. StarCoder) on code generation tasks. It includes:

Constant Length Dataset Loader
Scaling laws for computing the correct number of steps, given number of gpus, effective batch size, and number of epochs
LoRA, with 8, 4 bits and QLoRA (double quant) support
DeepSpeed support for fine-tuning large models
Edu-score filtering to remove non-educational data
Multi-language loss evaluation (using MultiPL-E evaluation datasets)
Custom tokenizer injection
Automatic mixed precision quantization

About

MIT License

Language:Python 85.0%Language:Shell 15.0%