jina-ai / jerboa

LLM finetuning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for deep speed

alaeddine-13 opened this issue · comments

DeepSpeed does not want to run on our GPU machine since the fused_adam op cannot be compiled, neither in JIT nor in pre-compiled mode.
I tried various versions of deepspeed and various versions of PyTorch. The only variable I can think of at this point is the cuda/nvvm version that is installed on our machine.

Since we can currently train on an A100 GPU without needing deepspeed, we put this issue on hold.