huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How is it compared with Megatron Deepspeed?

allanj opened this issue · comments

  1. Wondering about the relationship with Megatron Deepspeed
  2. Are they the same thing? or which one is faster?

The plan is keeping the codebase as minimal as possible with a more explicit and accessible design for users. And at least on par or faster performance than megatron deepspeed