Fused operations

Question

bclarkson-code opened this issue 2 months ago · comments

A lot of time is spent in MLP blocks and attention blocks. The operations in these blocks can be fused to reduce both memory usage and latency