Giters
deepseek-ai
/
DeepSeek-MoE
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
877
Watchers:
13
Issues:
32
Forks:
37
deepseek-ai/DeepSeek-MoE Issues
单卡A100-80G推理速度慢
Updated
2 days ago
About expert capacities: Is there token-dropping during training?
Closed
2 months ago
Comments count
3
MOE 并行怎么实现的?
Updated
2 months ago
Comments count
1
模型结果测评复现
Updated
2 months ago
Comments count
1
您好,能否提供量化的方案
Updated
2 months ago
Comments count
2
No need to add epsilon 1e-20 in topk norm?
Closed
2 months ago
能添加modelscope链接吗,这样可以更方便一些不能连hg的情况
Updated
2 months ago
您们好请问准备开源的moe-145b什么时候准备上传呢?
Updated
2 months ago
Comments count
3
finetune后的模型输出异常
Closed
3 months ago
Comments count
4
load erros
Closed
3 months ago
Comments count
2
deepseek-moe模型在进行lora微调训练时loss值会突然变为0一直到最后,导致推理异常。
Updated
3 months ago
Comments count
2
请问现在支持在NPU设备上进行微调吗
Closed
3 months ago
Comments count
1
Can you provide the inference version of DeepSeek based on vllm, deepspeed and tensorrt-llm
Closed
3 months ago
Comments count
1
How to fully finetune MoE on multiple nodes
Closed
4 months ago
Comments count
1
您们有计划支持llama.cpp这个项目吗
Updated
4 months ago
Comments count
1
您们能够开源复现模型架构的训练项目吗?
Closed
4 months ago
Comments count
3
关于flash_attn
Closed
4 months ago
Comments count
1
非常棒的工作,有没有微信沟通群呢
Closed
4 months ago
Comments count
1
Will it compare performance with llama-moe?
Closed
4 months ago
Comments count
1
Selective precision In gate and norm may conflict with deepspeed?
Closed
4 months ago
Comments count
1
GPU utils is low compared with dense model
Closed
4 months ago
Comments count
4
#feature request# DeepSeek-Moe for code
Closed
4 months ago
Comments count
1
Question about AddAuxiliaryLoss?
Closed
4 months ago
Comments count
1
deepseek-moe-16b inference speed is slower than Baichuan-13b
Closed
4 months ago
Comments count
3
开源的MoE模型支持中文吗?
Closed
4 months ago
Comments count
4
inference tools like vllm can support?
Closed
4 months ago
Comments count
3
flash atten
Closed
4 months ago
求助:模型无法加载
Closed
4 months ago
Comments count
4
您们会开源DeepSeekMoE 2B模型吗?
Updated
4 months ago
Comments count
5
The released DeepSeekMoE 16B Base has 3 different vocab size
Closed
4 months ago
Comments count
2
finetune 过程出错
Closed
4 months ago
Comments count
1
CUDA error: device-side assert triggered when trying to run the model
Closed
4 months ago
Comments count
2