lalalune / arcprize

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate padding tokens / dynamic batching

lalalune opened this issue · comments

Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.