[{ Refact.ai Inference Server
This is a self-hosted server for the refact.ai coding assistant.
With Refact you can run high-quality AI code completions on-premise and use a number of functions for code transformation and ask questions in the chat.
This server allows you to run AI coding models on your hardware, your code doesn't go outside your control.
At the moment, you can choose between following models:
Model | GPU (VRAM) | CPU (RAM) | Completion | AI Toolbox | Chat | Languages supported |
---|---|---|---|---|---|---|
CONTRASTcode/medium/multi | 3Gb | 3Gb | + | 20+ Programming Languages | ||
CONTRASTcode/3b/multi | 8Gb | 12Gb | + | 20+ Programming Languages | ||
starcoder/15b/base4bit | 16Gb | - | + | + | + | 80+ Programming languages |
starcoder/15b/base8bit | 32Gb | - | + | + | + | 80+ Programming languages |
Refact is currently available as a plugin for JetBrains products and VS Code IDE.
Known limitations
- for best results on smaller GPUs we recommend using CONTRASTcode models as the StarCoder models can be quite slow
- StarCoder AI Toolbox and Chat in JetBrains will be available later (May 12-14)
Demo
Getting started
Install plugin for your IDE: JetBrains or VSCode.
Running Server in Docker
The recommended way to run server is a pre-build Docker image.
Install Docker with NVidia GPU support. On Windows you need to install WSL 2 first, one guide to do this.
Docker tips & tricks
Add your yourself to docker group to run docker without sudo (works for Linux):
sudo usermod -aG docker {your user}
List all containers:
docker ps -a
Create a new container:
docker run
Start and stop existing containers (stop doesn't remove them):
docker start
docker stop
Remove a container and all its data:
docker rm
Shows messages from the container:
docker logs -f
Choose model from available ones.
Run docker container with following command:
docker run --rm --gpus 0 -p 8008:8008 -v refact_workdir:/workdir --env SERVER_MODEL=<model name> smallcloud/refact_self_hosting
If you don't have a suitable GPU run it on CPU:
docker run --rm -p 8008:8008 -v refact_workdir:/workdir --env SERVER_MODEL=<model name> smallcloud/refact_self_hosting
After start container will automatically download the chosen model.
Running Manually
To run server manually, install this repo first (this might install a lot of packages on your computer):
pip install git+https://github.com/smallcloudai/code-contrast.git
pip install git+https://github.com/smallcloudai/refact-self-hosting.git
Now you can run server with following command:
python -m refact_self_hosting.server --workdir /workdir --model <model name>
Setting Up Plugins
Go to plugin settings and set up a custom inference url:
https://localhost:8008
JetBrains
Settings > Tools > Refact.ai > Advanced > Inference URLVSCode
Extensions > Refact.ai Assistant > Settings > InfurlNow it should work, just try to write some code! If it doesn't, please report your experience to GitHub issues.
Remote server
If you run server on remote host, you should add it to /etc/hosts (or C:\Windows\System32\drivers\etc\hosts on Windows) on client. Do not forget to replace {server ip address} to real server ip address.
{server ip address} inference.smallcloud.local
and set up this inference url in plugin:
https://inference.smallcloud.local:8008
Community & Support
Join our Discord server and follow our Twitter to get the latest updates.
Contributing
We are open for contributions. If you have any ideas and ready to implement this, just: