Does Mask-Attention really work?

Question

Does Mask-Attention really work?

Jummmmmp opened this issue 4 months ago · comments

I set return_att_masks = True in python inference.py and use the following code to run your model:python inference.py --num_images 8 --output OUTPUT/ --input_json demos/demo_corgi_kitchen.json --ckpt pretrained/instancediffusion_sd15.pth --test_config configs/test_box.yaml --guidance_scale 7.5 --alpha 0.8 --seed 0 --mis 0.36 --cascade_strength 0.4. In contrast, I deleted all the "mask" value in demo_corgi_kitchen.json and tried again, but the result still remain the same. So I wonder does Mask-Attention really work? Or is there any proper way to use Mask-Attention? Wish your early reply, thanks!

XuDong Frank Wang · Answer 1 · Mon Mar 18 2024 00:26:33 GMT+0800 (China Standard Time)

Hi, for increasing inference speed and reduce the GPU memory usage, we've implemented Flash Attention in our inference codes. Currently, Flash Attention does not support attention masking. Furthermore, our ablation study in the paper, detailed in the attached screenshot, indicates that the exclusion of the attention mask results in only a small drop in performance.

As a result, attention masking has been disabled by default to prioritize faster inference times. If you wish to enable mask-attention, you can do so by setting efficient_attention to false in the configuration file located at configs/test_box.yaml.

Hope it helps.

Jummmmmp · Answer 2 · Mon Mar 18 2024 00:47:50 GMT+0800 (China Standard Time)

I try to change test_box.yaml to test_mask.yaml and then the problem solved! I can test mask-attention now. It's so kind of you. Thanks!