Phil Wang's repositories
musiclm-pytorch
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
make-a-video-pytorch
Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch
perceiver-pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
siren-pytorch
Pytorch implementation of SIREN - Implicit Neural Representations with Periodic Activation Function
memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
deformable-attention
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"
electra-pytorch
A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch
block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
mlm-pytorch
An implementation of masked language modeling for Pytorch, made as concise and simple as possible
graph-transformer-pytorch
Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2
ETSformer-pytorch
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch
mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
VN-transformer
A Transformer made of Rotation-equivariant Attention using Vector Neurons
rvq-vae-gpt
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
flash-genomics-model
My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other hierarchical methods)
coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
nim-tokenizer
Implementation of a simple BPE tokenizer, but in Nim