Stable Diffusion with FlashAttention To test: python inference_test.py Expected performance on A100: 1.5-1.6s.