How to run FA
MrigankRaman opened this issue · comments
Thanks for supporting FA!! I was wondering where to find the code changes needed to be done to use FA
You just set the flash_attention
parameter when create the Generator:
generator = ctranslate2.Generator(model_dir, device="cuda", flash_attention=True)