fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to install without docker?

saddy001 opened this issue · comments

How to install fauxpilot without docker?

That can be an easy process or a slightly painful one depending upon how much control you have over where you're installing. If you have root access and it's an Ubuntu-like system, it'll be easyish.

Basically you need to do the following:

  1. Build triton server (their docs show how to do this).
  2. Build the fastertransformer backend. Refer to @moyix 's docker file for how to build it.
  3. Modify the setup.sh and launch.sh scripts so that they work without docker
  4. Profit.

Now, the difficulty comes from the fact that you've to install dependencies for Triton and faster transformer backends. If you have root access, this will be easy. Otherwise you've to get the dependencies and build them from source first. Then you'll have to modify the cmake files while building Triton and fastertransformer so that they find these dependencies. There'll also be some issues regarding versioning - you're dealing with 3 systems (Triton, fastertransformer backend, and faster transformer itself), and unfortunately I couldn't find clear guidelines on which versions are compatible with each other, so had to stumble my way through this.

I had to do this recently for a system where I didn't have root and not all libraries were installed. It was...an adventure. Wouldn't recommend unless you really have to do this.

I've tried to install fauxpilot on WSL2 (Ubuntu 20.04) without docker, due to many reasons (mainly network problems, but also disk storage shortage). Here's something I could share:

  • As for triton (repo), it seems there are no pre-built releases for Ubuntu (only client, no server), so you have to build from source following the part of documentation for building without docker.

    • First you have to prepare CUDA and cuDNN so that it works like in official NGC container (but TensorRT mentioned in the guide is not needed for fauxpilot, at least for FasterTransformer backend),
    • Then you may install apt package / cmake / pip packages according to the Dockerfile generated by build.py --dryrun option (in triton server source folder).
    • Finally, build from source. I use such options for building: ./build.py -v --enable-logging --enable-stats --enable-gpu --enable-tracing --endpoint=http --endpoint=grpc --no-container-build --build-dir=$(pwd)/build If you are working on other platforms, please check the options by build.py -h and see if anything is missed.
    • When files are all built, copy them to /opt folder.
  • As for fastertransformer_backend (repo), just follow the Dockerfile like what @thakkarparth007 mentioned. You just need to divide the make -jx install command at last into make -jx && sudo make install if you do not compile the files as root. The compiled libs are going into /opt/tritonserver, and you may check the directory once again so that nothing is wrong.

  • After all these finished, clone this repo, and install the pip packages mentioned in Dockerfile, as well as the requirements under copilot_proxy folder (simply by requirements.txt). As for the main scripts, setup.sh is still applicable for generating .env file and downloading models, but launch.sh should be avoided. You just need to launch the model by:

    source .env
    CUDA_VISIBLE_DEVICES=${GPUS} mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${MODEL_DIR}

    Then you may launch the api server by running the app.py under copilot_proxy folder. As for my device, gRPC port (8001) somehow does not work, so I change it to http port (8000) in the api server's script.

There are many trivial problems during compilation and deployment (mostly path problems), but they should be easy to cope with. As for the version problem mentioned by @thakkarparth007, I just use version 22.06 for triton server (the number is for NGC version; triton itself is v2.23.0) noted in fastertransformer_backend's Dockerfile; CUDA version = 12.1.0-1; cuDNN 8.8.1.3-1+cuda12.0. However, fauxpilot seems to use 22.09 triton image as base, so maybe it is built later.

Thanks for your feedback. However, I found that I can run the CodeGen model through Huggingfaces Transformers library, which is much easier.