Persistent buffer with L2 cache
naoyam opened this issue · comments
Naoya Maruyama commented
L2 cache is quite large on A100 and later generations. Should be used in addition to registers and shared memory.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
naoyam opened this issue · comments
L2 cache is quite large on A100 and later generations. Should be used in addition to registers and shared memory.