Investigate padding tokens / dynamic batching
lalalune opened this issue · comments
Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.
lalalune opened this issue · comments
Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.