Fused operations
bclarkson-code opened this issue · comments
A lot of time is spent in MLP blocks and attention blocks. The operations in these blocks can be fused to reduce both memory usage and latency
Deep learning framework completely from scratch in python + numpy
bclarkson-code opened this issue · comments
A lot of time is spent in MLP blocks and attention blocks. The operations in these blocks can be fused to reduce both memory usage and latency