support alternative parallelism

Question

support alternative parallelism

152334H opened this issue 8 months ago · comments

--num-gpus is implemented by sharding each expert layer across GPUs, i.e. expert parallelism

this is probably not advisable for local experimentation, especially on batch size 1 -- where EP only adds communication overhead to no speed benefit vs naive model/pipeline parallel.

Songyang Zhang · Answer 1 · Sun Dec 10 2023 16:49:20 GMT+0800 (China Standard Time)

Good suggestions, I am working on other parallelism method. Also, contribution is welcomed.