feature request: support 4d attention masks
poedator opened this issue · comments
this is a Feature Request to implement custom 4D mask for Llama (and possibly any other model) similar to huggingface/transformers#27539
Fast inference engine for Transformer models
poedator opened this issue · comments
this is a Feature Request to implement custom 4D mask for Llama (and possibly any other model) similar to huggingface/transformers#27539