Papers Voting
hadyelsahar opened this issue Β· comments
In this issue you can either:
- Add papers that you think are interesting to read and discuss (please stick to the format).
- vote: should be done using π on comments
Reformer: The Efficient Transformer
https://arxiv.org/abs/2001.04451
Summary:
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L2) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
Unsupervised Question Decomposition for Question Answering
https://arxiv.org/abs/2002.09758
Twitter thread: https://twitter.com/EthanJPerez/status/1232127027961942018
"We aim to improve question answering (QA) by decomposing hard questions into easier sub-questions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce sub-questions."