lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

lucidrains/performer-pytorch Issues

Residual Connection
Closed 3 years ago3
Modify the transformer tutorial based on performer
Updated a year ago
Cross-attention with arbitrary causal mask
Updated a year ago
Question about masking
Updated a year ago2
Pretrained example
Updated 2 years ago
Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
Updated 2 years ago1
Causal linear attention benchmark
Closed 3 years ago13
Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules
Updated 2 years ago
I want to use Peroformer on MAE
Updated 2 years ago
Question: Is Performer order equivariant? (can it transform an unordered set of tensors)
Updated 2 years ago
Relative Positional Encoding for Linear Attention Models.
Closed 2 years ago3
Using Performer with GNNs
Updated 2 years ago
Huge model state dict size?
Updated 2 years ago
Attention map
Closed 2 years ago2
Performer Plain
Updated 2 years ago
How to test the performer architecture for training new models?
Updated 2 years ago1
Output inconsistent for autoregressive performer
Updated 2 years ago2
Rotary Position Embedding
Updated 3 years ago
torch_tensorrt compilation fails
Updated 3 years ago
way to make two elements invisible?
Updated 3 years ago
torch.max(data_dash) bug
Closed 3 years ago2
SelfAttention layer seems to have large error relative to nn.MultiheadAttention?
Updated 3 years ago8
FastAttention doesn't give results in agreement with standard attention?
Updated 3 years ago7
Recover attention scores
Updated 3 years ago3
hyperbolic cosine based estimator
Updated 3 years ago
Names `to_k`, `to_q`, `to_v`, `to_out` cause issues
Updated 3 years ago
Input and Context size in CrossAttention
Closed 3 years ago2
Performer Benchmark
Updated 3 years ago
Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?
Updated 3 years ago6
Performance gain replacing original attention to fast attention in this repo?
Updated 3 years ago2
Causal performer slower than causal regular attention
Updated 3 years ago3
`to_out` bias
Closed 3 years ago3
why is bias true in `to_<q,k,v>`?
Closed 3 years ago4
Getting error with the check_redraw_projections when using DataParallel
Closed 3 years ago4
Saving checkpoints during training and loading
Closed 3 years ago3
No fp16 support from fast-transformers (CausalDotProduct)
Updated 3 years ago75
Decoder Mask
Updated 3 years ago
Triangular matrices ?
Closed 4 years ago10
Deterministic layers
Updated 3 years ago1
context-specific embeddings from language model?
Updated 3 years ago
Extra FF when using cross attention
Closed 3 years ago8
Decoder randomly outputs NaN tensor.
Closed 4 years ago5
FixNorm alongside ScaleNorm
Updated 4 years ago3
[Feature] Adding fixed positional embeddings as an option
Closed 4 years ago3
Applying decoder input mask?
Closed 4 years ago2
Bug fix in original google-research implementation
Closed 4 years ago3
Plain Performer, if you are working with say images or other modalities
Updated 4 years ago1
Question: torch.max term used in `softmax_kernel`
Closed 4 years ago4
wrong implementation for autoregressive self-attention
Closed 4 years ago10
Use performer for finetunig task
Updated 4 years ago