Triton cannot find config.pbtxt
ankit-db opened this issue · comments
Having problems with a source code of a github repository?
Having problems with the FauxPilot that controls the build process?
Good to go? Then please remove these lines above, including this one, and help us understand your issue by answering the following:
Issue Description
While trying to set up Fauxpilot with the codegen-16b-multi model in FasterTransformers mode, I am getting an error like:
E0216 16:50:02.558074 89 model_repository_manager.cc:2063] Poll failed for model directory 'fastertransformer': failed to open text file for read /model/fastertransformer/config.pbtxt: No such file or director
when running ./launch.sh
inside the Triton server. Note that I am using this with 4 GPUs, so it went down the path of using the converter in the Docker container. This results in 0 models ready. Is this file expected to be generated by the converter?
Expected Result
A clear and concise description of what you expected to happen.
How to Reproduce
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Further Information
- A link to an output result showing the issue
- Exact OS version
Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!
As an update, I don't see this issue when running ./setup.sh
with 2 GPUs on the same model
Perhaps the problem is that this script does not generate the Triton config? https://github.com/fauxpilot/fauxpilot/blob/main/converter/download_and_convert_model.sh
cc: @fdegier @thakkarparth007 I see you guys just committed to this file - do you know what might be going on here? The triton config generator is not even included in the docker image that does this generation
Hey @ankit-db I think the config.pbtxt file just comes with the fauxpilot repository for 1 and 2gpu variants. For other variants, I think the ./converter/trition_config_gen.py
script should be invoked. I haven't done the conversion myself, but let me give it a shot once. We should also update the setup.sh file to invoke this automatically. Thanks for pointing this out!
@thakkarparth007 no worries! yes, definitely would love to see it in setup.sh! I've been hacking some stuff together to try to get that file generated for a 4GPU setup, but ran into something weird where Triton says it can't read the weights files. Going to try changing some permissions things. Let me know if you get it working!
Hey @ankit-db I was able to generate the config.pbtxt by doing the following:
- Modify the triton_config_gen.py file on line 59, change
params['name'] = model_name
toparams['name'] = "codegen-350M-multi"
(or whatever your model size is) python3 converter/triton_config_gen.py --template converter/config_template.pbtxt --model_store models --hf_model_dir "Salesforce/codegen-350M-multi" -n 4
This generated a configpb.txt under models/codegen-350M-multi-4gpu/fastertransformer/config.pbtxt
. Let me know if that works, I'll have a PR to make this automatic.
Hey @ankit-db
in addition to what @thakkarparth007 shared, if you diff the 1gpu and 2gpu configs provided, you'll see only two changes
ran into something weird where Triton says it can't read the weights files
in config.pbtxt
check if the parameter model_checkpoint_path
actually points to where your weights live
@thakkarparth007 @MichaMucha thanks for your help! I did get this working by just using that script and then using the rebase argument in the script to use /model
instead of the path it was using
Thanks! If it helps, I can put up a PR to fix this broken part of setup.sh
setup.sh download the model from https://huggingface.co/moyix/codegen-350M-multi-gptj/blob/main/codegen-350M-multi-1gpu.tar.zst but there is no convert step and no config.pbtxt
setup.sh download the model from https://huggingface.co/moyix/codegen-350M-multi-gptj/blob/main/codegen-350M-multi-1gpu.tar.zst but there is not convert step and no config.pbtxt
use python3 converter/triton_config_gen.py --template converter/config_template.pbtxt --model_store models --hf_model_dir "Salesforce/codegen-350M-multi" -n 1 can generate the config.pbtxt
and parameters {
key: "model_checkpoint_path"
value: {
string_value: "fastertransformer/1/1-gpu"
}
}
the checkpoint path may edit to a correct path if you run triton_config_gen.py and your models dir not in a suitable path
Yup! Haven't had time to put up a PR yet unfortunately