This repo consists Refact WebUI for fine-tuning and self-hosting of code models, that you can later use inside Refact plugins for code completion and chat.
- Fine-tuning of open-source code models
- Self-hosting of open-source code models
- Download and upload Lloras
- Use models for code completion and chat inside Refact plugins
- Model sharding
- Host several small models on one GPU
- Use OpenAI keys to connect GPT-models for chat
The easiest way to run the self-hosted server is a pre-build Docker image.
Install Docker with NVidia GPU support. On Windows you need to install WSL 2 first, one guide to do this.
Run docker container with following command:
docker run -d --rm --gpus all -p 8008:8008 -v refact-perm-storage:/perm_storage -v refact-database:/var/lib/cassandra smallcloud/refact_self_hosting:latest
perm-storage
is a volume that is mounted inside the container. All the configuration files, downloaded weights and logs are stored here.
refact-database
is a volume for database where server stores statistics from your users.
To upgrade the docker, delete it using docker kill XXX
(the volume perm-storage
will retain your
data), run docker pull smallcloud/refact_self_hosting
and run it again.
Now you can visit http://127.0.0.1:8008 to see the server Web GUI.
Docker commands super short refresher
Add your yourself to docker group to run docker without sudo (works for Linux):sudo usermod -aG docker {your user}
List all containers:
docker ps -a
Start and stop existing containers (stop doesn't remove them):
docker start XXX
docker stop XXX
Shows messages from a container:
docker logs -f XXX
Remove a container and all its data (except data inside a volume):
docker rm XXX
Check out or delete a docker volume:
docker volume inspect VVV
docker volume rm VVV
See CONTRIBUTING.md for installation without a docker container.
Download Refact for VS Code or JetBrains.
Go to plugin settings and set up a custom inference URL http://127.0.0.1:8008
JetBrains
Settings > Tools > Refact.ai > Advanced > Inference URLVSCode
Extensions > Refact.ai Assistant > Settings > InfurlModel | Completion | Chat | Fine-tuning |
---|---|---|---|
Refact/1.6B | + | + | |
starcoder/1b/base | + | + | |
starcoder/3b/base | + | + | |
starcoder/7b/base | + | + | |
starcoder/15b/base | + | ||
starcoder/15b/plus | + | ||
wizardcoder/15b | + | ||
codellama/7b | + | + | |
starchat/15b/beta | + | ||
wizardlm/7b | + | ||
wizardlm/13b | + | ||
wizardlm/30b | + | ||
llama2/7b | + | ||
llama2/13b | + | ||
deepseek-coder/1.3b/base | + | + | |
deepseek-coder/5.7b/mqa-base | + | + | |
magicoder/6.7b | + | ||
mistral/7b/instruct-v0.1 | + | ||
mixtral/8x7b/instruct-v0.1 | + | ||
deepseek-coder/6.7b/instruct | + | ||
deepseek-coder/33b/instruct | + |
Refact is free to use for individuals and small teams under BSD-3-Clause license. If you wish to use Refact for Enterprise, please contact us.
You can also install refact repo without docker:
pip install .
If you have a GPU with CUDA capability >= 8.0, you can also install it with flash-attention v2 support:
FLASH_ATTENTION_FORCE_BUILD=TRUE MAX_JOBS=4 INSTALL_OPTIONAL=TRUE pip install .
Q: Can I run a model on CPU?
A: it doesn't run on CPU yet, but it's certainly possible to implement this.
- Contributing CONTRIBUTING.md
- GitHub issues for bugs and errors
- Community forum for community support and discussions
- Discord for chatting with community members
- Twitter for product news and updates