lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

implicit inplace operation '*=' cause an error when deriving the back gradient in pytorch

VRCMF opened this issue · comments

commented

In the code Error, it cause the failure of deriving the back gradient.

Solution:
density_1_proxy = density_1_proxy*equals_one_mask[..., None]

@VRCMF thanks Wei! 04201ee