accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s
accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s