deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

NAN issue using FP16 to load the model

zitgit opened this issue · comments

when I changed the torch_dtype of the loading function from torch.bfloat16 to torch.float16,
which is
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.float16,
The inference wont work. Activation will contain Nan. Is this a known issue?
env: A100*8; transformers Version: 4.44.0