lucidrains/mixture-of-experts Issues
Load balancing loss?
Closed 2convolution operation
UpdatedSegmentation Fault?
Closed 1
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models