Adding reasoning about `batchify` in language model example
HallerPatrick opened this issue Β· comments
π Documentation
The comment/documentation of the function batchify
in word_language_model/main.py
gives a explanation how the sequence is rearranged into columns, with the explanation of "efficient batch processing".
For me it is not inherently clear, why that would help. It even confused me the first time I looked at it, or tried to debug. I maybe would love a little more explanation or a reference, where I can read more about it.
Hope I used the right issue template here...
Greetings,
Patrick
I found these:
- https://discuss.pytorch.org/t/why-parameter-batch-first-is-needed/25769/2
- https://stackoverflow.com/a/49473068
I'm assuming that this is due to the underlying CUDA kernel being better suited for batches having sequence dimension first, followed by the batch dimension