Do word embeddings have gradients attached ?
Jogima-cyber opened this issue · comments
I'm testing HELM in an applied project, and word embeddings have a gradient attached (here:
Line 108 in b2bfb0d
Could it be that you're actually optimizing the embeddings without knowing it? (and that the gradient of the embeddings is flowing through the TrXL ?).
Problem solved: indeed embeddings have gradient attached flowing through TrXL, but you're running forward() only inside a torch.no_grads so this is not happening.
Hi,
thank you for your interest in our work.
As you already found, during the rollout collection no gradient information is used.
Also in the forward() method of HELM the hiddens coming from the TrXL are detached, before being passed back:
Line 154 in b2bfb0d
Even if the FrozenHopfield mechanism is used in isolation and gradients would be propagated through the TrXL embedding matrix, the token embeddings would need to be re-instantiated such that the changed tokens are leveraged, which is not the case.
However, for the sake of completeness, I added the .detach() in the instantiation of the FrozenHopfield mechanism to avoid a memory leak when using the FrozenHopfield in isolation.