AmeenAli/HiddenMambaAttn Issues
About Train Time
Closed 1cross attention
Closed 2Why A^- is a diagonal matrix?
Updated 12
ClosedAttention Computation Question
Closed 4rollout calculation
Closed 1Type Error
Closed 6
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"