deepseek-ai / DeepSeek-MoE

deepseek-ai/DeepSeek-MoE Issues

单卡A100-80G推理速度慢
Updated 2 days ago
About expert capacities: Is there token-dropping during training?
Closed 2 months ago3
MOE 并行怎么实现的？
Updated 2 months ago1
模型结果测评复现
Updated 2 months ago1
您好，能否提供量化的方案
Updated 2 months ago2
No need to add epsilon 1e-20 in topk norm?
Closed 2 months ago
能添加modelscope链接吗，这样可以更方便一些不能连hg的情况
Updated 2 months ago
您们好请问准备开源的moe-145b什么时候准备上传呢?
Updated 2 months ago3
finetune后的模型输出异常
Closed 3 months ago4
load erros
Closed 3 months ago2
deepseek-moe模型在进行lora微调训练时loss值会突然变为0一直到最后，导致推理异常。
Updated 3 months ago2
请问现在支持在NPU设备上进行微调吗
Closed 3 months ago1
Can you provide the inference version of DeepSeek based on vllm, deepspeed and tensorrt-llm
Closed 3 months ago1
How to fully finetune MoE on multiple nodes
Closed 4 months ago1
您们有计划支持llama.cpp这个项目吗
Updated 4 months ago1
您们能够开源复现模型架构的训练项目吗?
Closed 4 months ago3
关于flash_attn
Closed 4 months ago1
非常棒的工作，有没有微信沟通群呢
Closed 4 months ago1
Will it compare performance with llama-moe?
Closed 4 months ago1
Selective precision In gate and norm may conflict with deepspeed？
Closed 4 months ago1
GPU utils is low compared with dense model
Closed 4 months ago4
#feature request# DeepSeek-Moe for code
Closed 4 months ago1
Question about AddAuxiliaryLoss?
Closed 4 months ago1
deepseek-moe-16b inference speed is slower than Baichuan-13b
Closed 4 months ago3
开源的MoE模型支持中文吗？
Closed 4 months ago4
inference tools like vllm can support?
Closed 4 months ago3
flash atten
Closed 4 months ago
求助：模型无法加载
Closed 4 months ago4
您们会开源DeepSeekMoE 2B模型吗?
Updated 4 months ago5
The released DeepSeekMoE 16B Base has 3 different vocab size
Closed 4 months ago2
finetune 过程出错
Closed 4 months ago1
CUDA error: device-side assert triggered when trying to run the model
Closed 4 months ago2