Use fastertransformer backend with fp16?
mission909 opened this issue · comments
In huggingface_gptj_convert.py, I see an argument -weight_data_type. When I set that to fp16 and try to run the launch script, I get a bunch of triton warnings similar to:
[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.final_layernorm.bias.bin only has 2048, but request 4096, loading model fails!
And the model output is nonsense. Looks like FT is still expecting fp32, even though we set weight_data_type
to fp16 in the model config.ini
. Is there anything else I'm missing? Any other config or setting I need to change?
same question , but i noticed the FasterTransformer repository, in Model overview section, it said
- On Volta, Turing and Ampere GPUs, the computing power of Tensor Cores are used automatically when the precision of the data and weights are FP16.
i think maybe we should preprocess the input_data to fp16 precision first
Just to clarify, the weights do end up in FP16 on the GPU (that's what the is_half
is for in the model config). If I remember correctly in the version of FT FauxPilot uses the weights can't be stored in FP16 on disk.
@moyix the conversion logic does save the weights in FP16 to disk though, so should we perhaps modify that logic to not do that?