fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use fastertransformer backend with fp16?

mission909 opened this issue · comments

In huggingface_gptj_convert.py, I see an argument -weight_data_type. When I set that to fp16 and try to run the launch script, I get a bunch of triton warnings similar to:

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.final_layernorm.bias.bin only has 2048, but request 4096, loading model fails!

And the model output is nonsense. Looks like FT is still expecting fp32, even though we set weight_data_type to fp16 in the model config.ini. Is there anything else I'm missing? Any other config or setting I need to change?

commented

same question , but i noticed the FasterTransformer repository, in Model overview section, it said

  • On Volta, Turing and Ampere GPUs, the computing power of Tensor Cores are used automatically when the precision of the data and weights are FP16.

i think maybe we should preprocess the input_data to fp16 precision first

Just to clarify, the weights do end up in FP16 on the GPU (that's what the is_half is for in the model config). If I remember correctly in the version of FT FauxPilot uses the weights can't be stored in FP16 on disk.

@moyix the conversion logic does save the weights in FP16 to disk though, so should we perhaps modify that logic to not do that?