fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add 8bit and int4 inference support to run larger models with less VRAM

MarkSchmidty opened this issue · comments

Is your feature request related to a problem? Please describe.
The VRAM requirements of useful models are just beyond what most people have readily available. But cutting those requirements in half with 8bit (or better by 75% with int4) would make fauxpilot accessible to almost everyone with a consumer GPU.

Describe the solution you'd like
Add support for 8bit inference via bitsandbytes to the docker configs. Add support for int4 pre-quantized models as an alternative to their full size defaults where they exist (a growing number of models have an int4 version on HuggingFace).

Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!