[{ Refact.ai Inference Server

This is a self-hosted server for the refact.ai coding assistant.

With Refact you can run high-quality AI code completions on-premise and use a number of functions for code transformation and ask questions in the chat.

This server allows you to run AI coding models on your hardware, your code doesn't go outside your control.

At the moment, you can choose between following models:

Model	GPU (VRAM)	CPU (RAM)	Completion	AI Toolbox	Chat	Languages supported
CONTRASTcode/medium/multi	3Gb	3Gb	+			20+ Programming Languages
CONTRASTcode/3b/multi	8Gb	12Gb	+			20+ Programming Languages
starcoder/15b/base4bit	16Gb	-	+	+	+	80+ Programming languages
starcoder/15b/base8bit	32Gb	-	+	+	+	80+ Programming languages

Refact is currently available as a plugin for JetBrains products and VS Code IDE.

Known limitations

for best results on smaller GPUs we recommend using CONTRASTcode models as the StarCoder models can be quite slow
StarCoder AI Toolbox and Chat in JetBrains will be available later (May 12-14)

Demo

Getting started

Install plugin for your IDE: JetBrains or VSCode.

Running Server in Docker

The recommended way to run server is a pre-build Docker image.

Install Docker with NVidia GPU support. On Windows you need to install WSL 2 first, one guide to do this.

Docker tips & tricks

Add your yourself to docker group to run docker without sudo (works for Linux):

sudo usermod -aG docker {your user}

List all containers:

docker ps -a

Create a new container:

docker run

Start and stop existing containers (stop doesn't remove them):

docker start
docker stop

Remove a container and all its data:

docker rm

Shows messages from the container:

docker logs -f

Choose model from available ones.

Run docker container with following command:

docker run --rm --gpus 0 -p 8008:8008 -v refact_workdir:/workdir --env SERVER_MODEL=<model name> smallcloud/refact_self_hosting

If you don't have a suitable GPU run it on CPU:

docker run --rm -p 8008:8008 -v refact_workdir:/workdir --env SERVER_MODEL=<model name> smallcloud/refact_self_hosting

After start container will automatically download the chosen model.

Running Manually

To run server manually, install this repo first (this might install a lot of packages on your computer):

pip install git+https://github.com/smallcloudai/code-contrast.git
pip install git+https://github.com/smallcloudai/refact-self-hosting.git

Now you can run server with following command:

python -m refact_self_hosting.server --workdir /workdir --model <model name>

Setting Up Plugins

Go to plugin settings and set up a custom inference url:

https://localhost:8008

JetBrains

Settings > Tools > Refact.ai > Advanced > Inference URL

VSCode

Extensions > Refact.ai Assistant > Settings > Infurl

Now it should work, just try to write some code! If it doesn't, please report your experience to GitHub issues.

Remote server

If you run server on remote host, you should add it to /etc/hosts (or C:\Windows\System32\drivers\etc\hosts on Windows) on client. Do not forget to replace {server ip address} to real server ip address.

{server ip address}  inference.smallcloud.local

and set up this inference url in plugin:

https://inference.smallcloud.local:8008

Community & Support

Join our Discord server and follow our Twitter to get the latest updates.

Contributing

We are open for contributions. If you have any ideas and ready to implement this, just:

make a fork
make your changes, commit to your fork
and open a PR

smallcloudai / refact-self-hosting