WangRongsheng / Aurora

🐳 Aurora is a [Chinese Version] MoE model. Aurora is a further work based on Mixtral-8x7B, which activates the chat capability of the model's Chinese open domain.

Home Page:https://arxiv.org/abs/2312.14557

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

有没有多gpu推理的脚本?一张3090刚好放不下

wonder-hy opened this issue · comments

试了下用vllm多卡推理,没有成功...
在这试着问下
谢谢

嗨,

推荐您使用天宫云镜像一键部署使用:

如果您需要在单机多卡上运行该模型,请下载app.py,并以:

CUDA_VISIBLE_DEVICES=0,1,.. python app.py