deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

8 * A100 启动巨慢,有启动成功的勇士不

CarryChang opened this issue · comments

建议使用vllm启动vllm-project/vllm#4650

HuggingFace代码中accelerate库对模型的显存分配计算有问题,目前示例代码已修改,预计大幅缩短模型加载速度。

加载模型的代码修改为:

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")