Share weights of embedding layer with output layer in `CausalLanguageModel`
krasserm opened this issue · comments
Martin Krasser commented
A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training
krasserm opened this issue · comments