fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Triton cannot find config.pbtxt

ankit-db opened this issue · comments

Having problems with a source code of a github repository?

Having problems with the FauxPilot that controls the build process?

Good to go? Then please remove these lines above, including this one, and help us understand your issue by answering the following:

Issue Description

While trying to set up Fauxpilot with the codegen-16b-multi model in FasterTransformers mode, I am getting an error like:

E0216 16:50:02.558074 89 model_repository_manager.cc:2063] Poll failed for model directory 'fastertransformer': failed to open text file for read /model/fastertransformer/config.pbtxt: No such file or director

when running ./launch.sh inside the Triton server. Note that I am using this with 4 GPUs, so it went down the path of using the converter in the Docker container. This results in 0 models ready. Is this file expected to be generated by the converter?

Expected Result

A clear and concise description of what you expected to happen.

How to Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Further Information

  • A link to an output result showing the issue
  • Exact OS version

Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!

As an update, I don't see this issue when running ./setup.sh with 2 GPUs on the same model

Perhaps the problem is that this script does not generate the Triton config? https://github.com/fauxpilot/fauxpilot/blob/main/converter/download_and_convert_model.sh

cc: @fdegier @thakkarparth007 I see you guys just committed to this file - do you know what might be going on here? The triton config generator is not even included in the docker image that does this generation

Hey @ankit-db I think the config.pbtxt file just comes with the fauxpilot repository for 1 and 2gpu variants. For other variants, I think the ./converter/trition_config_gen.py script should be invoked. I haven't done the conversion myself, but let me give it a shot once. We should also update the setup.sh file to invoke this automatically. Thanks for pointing this out!

@thakkarparth007 no worries! yes, definitely would love to see it in setup.sh! I've been hacking some stuff together to try to get that file generated for a 4GPU setup, but ran into something weird where Triton says it can't read the weights files. Going to try changing some permissions things. Let me know if you get it working!

Hey @ankit-db I was able to generate the config.pbtxt by doing the following:

  1. Modify the triton_config_gen.py file on line 59, change params['name'] = model_name to params['name'] = "codegen-350M-multi" (or whatever your model size is)
  2. python3 converter/triton_config_gen.py --template converter/config_template.pbtxt --model_store models --hf_model_dir "Salesforce/codegen-350M-multi" -n 4

This generated a configpb.txt under models/codegen-350M-multi-4gpu/fastertransformer/config.pbtxt. Let me know if that works, I'll have a PR to make this automatic.

Hey @ankit-db
in addition to what @thakkarparth007 shared, if you diff the 1gpu and 2gpu configs provided, you'll see only two changes

ran into something weird where Triton says it can't read the weights files

in config.pbtxt check if the parameter model_checkpoint_path actually points to where your weights live

@thakkarparth007 @MichaMucha thanks for your help! I did get this working by just using that script and then using the rebase argument in the script to use /model instead of the path it was using

Thanks! If it helps, I can put up a PR to fix this broken part of setup.sh

setup.sh download the model from https://huggingface.co/moyix/codegen-350M-multi-gptj/blob/main/codegen-350M-multi-1gpu.tar.zst but there is no convert step and no config.pbtxt

setup.sh download the model from https://huggingface.co/moyix/codegen-350M-multi-gptj/blob/main/codegen-350M-multi-1gpu.tar.zst but there is not convert step and no config.pbtxt

use python3 converter/triton_config_gen.py --template converter/config_template.pbtxt --model_store models --hf_model_dir "Salesforce/codegen-350M-multi" -n 1 can generate the config.pbtxt
and parameters {
key: "model_checkpoint_path"
value: {
string_value: "fastertransformer/1/1-gpu"
}
}
the checkpoint path may edit to a correct path if you run triton_config_gen.py and your models dir not in a suitable path

Yup! Haven't had time to put up a PR yet unfortunately