How to install without docker?
saddy001 opened this issue · comments
How to install fauxpilot without docker?
That can be an easy process or a slightly painful one depending upon how much control you have over where you're installing. If you have root access and it's an Ubuntu-like system, it'll be easyish.
Basically you need to do the following:
- Build triton server (their docs show how to do this).
- Build the fastertransformer backend. Refer to @moyix 's docker file for how to build it.
- Modify the setup.sh and launch.sh scripts so that they work without docker
- Profit.
Now, the difficulty comes from the fact that you've to install dependencies for Triton and faster transformer backends. If you have root access, this will be easy. Otherwise you've to get the dependencies and build them from source first. Then you'll have to modify the cmake files while building Triton and fastertransformer so that they find these dependencies. There'll also be some issues regarding versioning - you're dealing with 3 systems (Triton, fastertransformer backend, and faster transformer itself), and unfortunately I couldn't find clear guidelines on which versions are compatible with each other, so had to stumble my way through this.
I had to do this recently for a system where I didn't have root and not all libraries were installed. It was...an adventure. Wouldn't recommend unless you really have to do this.
I've tried to install fauxpilot on WSL2 (Ubuntu 20.04) without docker, due to many reasons (mainly network problems, but also disk storage shortage). Here's something I could share:
-
As for triton (repo), it seems there are no pre-built releases for Ubuntu (only client, no server), so you have to build from source following the part of documentation for building without docker.
- First you have to prepare CUDA and cuDNN so that it works like in official NGC container (but TensorRT mentioned in the guide is not needed for fauxpilot, at least for FasterTransformer backend),
- Then you may install apt package / cmake / pip packages according to the
Dockerfile
generated bybuild.py --dryrun
option (in triton server source folder). - Finally, build from source. I use such options for building:
./build.py -v --enable-logging --enable-stats --enable-gpu --enable-tracing --endpoint=http --endpoint=grpc --no-container-build --build-dir=$(pwd)/build
If you are working on other platforms, please check the options bybuild.py -h
and see if anything is missed. - When files are all built, copy them to
/opt
folder.
-
As for
fastertransformer_backend
(repo), just follow the Dockerfile like what @thakkarparth007 mentioned. You just need to divide themake -jx install
command at last intomake -jx && sudo make install
if you do not compile the files as root. The compiled libs are going into/opt/tritonserver
, and you may check the directory once again so that nothing is wrong. -
After all these finished, clone this repo, and install the pip packages mentioned in
Dockerfile
, as well as the requirements undercopilot_proxy
folder (simply byrequirements.txt
). As for the main scripts,setup.sh
is still applicable for generating.env
file and downloading models, butlaunch.sh
should be avoided. You just need to launch the model by:source .env CUDA_VISIBLE_DEVICES=${GPUS} mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${MODEL_DIR}
Then you may launch the api server by running the
app.py
undercopilot_proxy
folder. As for my device, gRPC port (8001) somehow does not work, so I change it to http port (8000) in the api server's script.
There are many trivial problems during compilation and deployment (mostly path problems), but they should be easy to cope with. As for the version problem mentioned by @thakkarparth007, I just use version 22.06
for triton server (the number is for NGC version; triton itself is v2.23.0
) noted in fastertransformer_backend's Dockerfile; CUDA version = 12.1.0-1; cuDNN 8.8.1.3-1+cuda12.0. However, fauxpilot seems to use 22.09
triton image as base, so maybe it is built later.
Thanks for your feedback. However, I found that I can run the CodeGen model through Huggingfaces Transformers library, which is much easier.